“Search Engines do not like ? and such in the address. “
“People have been discussing how search engines do not like ? marks etc. and how they won't index a page with those things”
“..google turns crazy if it spiders the same document with all time different content”
Over the past 3 years of working with osCommerce, I’ve heard misconceived comments like these over, and over again. It seems that new storeowners often confuse different issues, and their questions and comments, such as the ones above, spread like wildfire to other storeowners terrified that they might not be indexed properly by search engines.
For a litlte history, the term SEF (Search Engine Friendly) URLs cam about pre 1997. Google and Yahoo were just beginning to get going, and they created automated programs called ‘spiders’ to transverse the web, looking for webpages to add to their index. Nearly all webpages at the time were static HTML, dynamic content webpages were just starting to pop up around the web. The traditional URL for a static HTML website had always looked like this.
http://www.yoursite.com/catagory/your_product.html
Search engines indexed page after page of these static html pages with no problem. However, static content webpages were laborious to maintain, and webmasters soon developed a way to have one page display different content depending on certain logical conditions. Early dynamic content webpages were mostly all in perl asd asp, and their URLs looked more like this.
http://www.yoursite.com/category/your_product.pl?product_model=101
In the early days, this caught ‘spiders’ off guard a bit, as they were not written to be able to ‘handle’ the special characters ‘?’ and ‘=’ in the URLs. They actually ended up either truncating everything after the first special character, or ignoring the URL entirely. This meant the most of the dynamic URLs would not be included in the search engines index.
Developers came to realize the issue before search engines could address it, and devised a means to ‘re-write’ the URL to make it appear as though it were a static URL, without the special characters. This was a very good solution for a couple of years, and dynamic content webmasters were then able to have all of their product pages indexed by search engines.
Sometime in 2001 however, Google resolved the issue with indexing URLs wiht special characters, and dynamic content websites with URLs still containing special characters began to pop up all over the place. All of the other major search engines followed suite. Today, in a search for specific products, nearly half of the URLs returned in a query from Google contain special characters. And there doesn’t seem to be and preference given to a URL without special characters in it over a URL that has special characters in it.
http://www.google.com/search?hl=en&lr=&q=buy+world+of+warcraft
As you can see, Google themselves use URLs with special characters in them. Some might argue that Google is not the only search engine out there, and that there may still be search engines out there that do not index URLs with special characters in them. However, Google, MSN, and Yahoo all do. And 99.99% of all of your traffic will come from these three engines. In fact, I even searched a few dozen of the smaller search engines, and I have yet to find a single search engine that still does not support URLs with special characters. If you can find one, I’d like to see it. Personally, I don’t think there are any left, or at least none that you'd actually get traffic from.
Don’t get me wrong, there are other considerations to take into account when talking about ‘optimizing’ your URLs to improve your search engine listings. I’ll discuss some of these ideas in my next article, including SIDs, key phrase embedding, key phrase proximity, and URL length.
But there is no reason to fear that all of your product pages will not be indexed because your have special characters in your URL. It simply isn’t true anymore.
