Understanding Common Crawl Errors
Search Engine Visibility V1 crawls your site using a Web crawler that makes HTTP requests to the server that hosts your site.
There are times when Search Engine Visibility V1 cannot complete the crawl. The following sections describe the most common reasons a crawl resulted in an error.
Your home page returns an HTTP status other than 200. While an HTTP status of 200 is a good status, all others mean there is an error somewhere. This error indicates that most search engines can't index the home page's content or follow its links. Redirects are OK as long as they end in a 200 status.
Your home page loops indefinitely in a redirect. Most search engines consider pages like this broken because they can't get to valid content. Some common causes of this error are:
- Your site requires cookies.
- Your site has incorrectly configured redirects (i.e. http://coolexample.com/ redirects to http://coolexample/com/a/, but then http://coolexample.com/a/ redirects back to http://coolexample.com/). Be sure that all redirects end in a 200 status.
Your home page is not identifying itself as an HTML page. Most search engines only index and crawl links of HTML pages. Your HTTP response header Content-Type: value does not start with text/html or application/xhtml.
Your root URL's format is invalid. Most search engines will not crawl it. Some common causes are:
- The URL contains invalid characters.
- The URL is greater than 500 characters.
- The URL contains a username or password (i.e. http://email@example.com/).
We didn't receive a response from your server when we requested your home page. As a result, search engines can't crawl your site. Some common causes are:
- You set up a new domain for hosting within the last 48 hours and DNS is still propagating. Once your DNS propagates, we recommend recrawling your website with Search Engine Visibility V1.
- The URL and/or domain does not exist.
Your robots.txt file explicitly blocks Search Engine Visibility V1 from crawling your home page, your home page is redirecting to a page that is blocked, or your home page's content location is blocked.
Your server committed an HTTP protocol violation so we could not properly crawl your home page. Search engines can't crawl your site.
We were unable to complete your site crawl within a four-hour period. Search engines can't crawl your site properly, which means many of your pages aren't being crawled or indexed. Some common causes are:
- Your site can't respond to concurrent HTTP requests.
- Http requests to your pages repeatedly time out.
- Http requests to your pages repeatedly return 502, 503 or 504 HTTP statuses.