Saturday, April 19, 2008

Tutorial: Search Engines Won't Index My Website


Search Engines Won’t Index Your Blog

I have a couple of blogs that the search engines will not index. I can run the site through my WEBCEO software program and it shows the sites indexed on Google and Yahoo, plus a few others, but when I check out the site, it is not indexed.

In fact, there are websites with PR3 which are still not indexed. Not being indexed does not necessarily mean the webmaster has done something wrong.

SiteMap

There are several ways to index a site. One way is to add a site map to the website. This might not always be possible. If you have a Google account, try adding your FEED file and let Google use that as the site map – as with all Google tools, it doesn’t always work. I find that it works on two-thirds of my blogs.

Google has a hard time verifying Google’s own blogger. I’ve followed 3 sets of instructions and none of them worked. However, I verified Drupal and Wordpress without any problems. If you can get your site verified and use the sitemap tool to get indexed.

I did a little twist and got one of my sites indexed. The site was http://work-at-home-guru.blogspot.com/ WEBCEO said 61 pages were indexed. I did a site and link search and found almost 70 pages. But, when an advertiser did their search they didn’t find any search engines had indexed the site. Google wouldn’t let me ad /feed to the sitemap. I tried to verify the site – wouldn’t work.

So, I made a RSS feed at www.feedburner.com and let Google search that. Do not worry about errors. The objective is not to get the site indexed well, but to make Google take a look at the site.

AdWords

If the site is commercial and the site needs to be indexed quickly, then join Google’s PPC program. Their rules state that there is no minimum, but people I know who have AdWord accounts state that the minimum is $50. This is not bad because you get advertising for it.

The benefit of this program is that Google must index the site before finding websites to advertise on. That means an instant index.

Problems

There are a few reasons why a website may not be indexed. First, is broken links. Search engines may stop at a broken link and assume the site is down. They may remove a site that was previously indexed.

Some websites such as Google and HotBot will not index sites with no inbound links. One way to get indexed is to link to a website or blog on a page ‘other’ than the index page.

Flash intros, videos and images may result in the spiders skipping a website. The index page must link to the rest of the site, if not – as in the case of flash intros – there is no links to the index page from the inside pages. Result, search engines will not index the site.

Most search engines will not index a free site.

Linking to a banned or benched website may result in the search engines refusing to index a site.

Submitting the site monthly can result in the site being ‘pruned’ from the database, or just not indexed in the first place.

Dynamic URLs are often overlooked. Any domain name that contains symbols and ? and &, are often ignored.

Only text is indexed. If the page has no text, then there is no index. Search engines cannot enter passwords, fill in forms, sign up for RSS, or newsletters. If the website asks people to do any of these before continuing – the sub pages will not be indexed.

Search engines will not index large pages, or ones that load slow. No web page should be more than 50k

404 message. Web hosting companies are not 100%. If the site is down when the robot comes looking then they will ignore the website.

Scripts cannot be indexed. A scripted menu is harder to index than a static one written in HTML.

Sometimes frames and scripts can make a search engine think they are at the end of the site and leave.

Template may be so complex, with far too many tags. This can make it difficult to get indexed. The webmaster may need to change templates.

What Next?

If the site is still not indexed and has a blog in it, try to manually ping the blog to a service such as pingoat.com or some of the others.

How to Submit

There are right ways, and wrong ways, to submit to search engines. Most websites are indexed organically. The search engines follow a link from one site to another, delving deep into the site. The indexing times can take anywhere from days to months.

One urban legend was Google’s sandbox. This doesn’t exist in the true sense. It just takes time for a site to be indexed. Most search engines may index only 300 – 400 pages. Getting the site indexed deeper can involve several techniques including setting up sub-sections and sub-domains and submitting them to search engines, individual page submission, and linking to that page from an outside source.

Submitting one page at a time which links to other pages is called creating a "hallway page." This will get previously ignored web pages indexed, and it may also improve the ranking. It is important to remember that web pages found organically receive more value than ones submitted. So letting Google follow links to un-indexed pages increase their rank.

Avoid Redirects, no-follow links, and meta refresh code. Most search engines will refuse to index these web pages.

If a web page does not change for several months, then search engines may purge it from their databases. Search engines love new content. That is why optimization must be done every few months. An old page may be re-submitted after part of the content or page is changed.

Text

Make sure the web pages are not full of code that cannot be indexed. Search engines will long look through a page indefinitely, past affiliate codes, meta info, heading info, dynamic menus, graphics, video, links, and finally to content. They may leave.

Robot File

All sites should have a robot file that tells the search engine what information is important. It contains meta information.

Archives

Archives can make it easier for search engines to dig deeper into the website. Check your CODE. Did the template designer put a no-follow code in it?

No comments: