One of the things that seems to be sorely lacking from just about every osC, or osC derivative is support for Collaborative Filtering. Collaborative filtering is the dynamic suggestion of products based on the customers browsing preferences compared against other customers that have similar interests.
For example, osCommerce offers the ability for registered users to provide a rating and a review of products, and displays these ratings and reviews to help you, as a new customer, get an idea of how other people enjoyed the product. Taken this concept a bit further, I’ve seen some osCommerce stores offer an average rating from all of the customers that have rated the product, so that you can quickly get an idea of what the average customers thinks. However, your interests might not be the same as the average customer’s interests. Therefore their rating might not be as relevant.
A very good example of this is at www.netflix.com. NetFlix is an online DVD renter. Netflix takes the ratigns that you provide, and gives you recommendations based on the kinds of movies that you like, and based on movies that other people liked that also like the kind movies that you like.
Sound complicated? Get used to it. In 5 years, I suspect that every major online retailer out there is going to utilize this type of powerful targeted marketing. If you only like to watch ‘Sci-Fi’ movies, doesn’t it makes sense that when you visit my site, I display suggestions for movies that are in that genre?
With this kind of enhanced suggestion marketing available, merchants can instantly boost their conversion ratios. And with pay per click, and other web advertising methods becoming more and more expensive, I’d want to try to be as specific as possible with the products I display, rather than randomly throwing up any product that strikes my fancy.
Obviously, using a customers ratings, and comparing those against categories, manufacturers, and other peoples ratings with similar interests are one way to by able to provide collaborative filtering. I’m sure there are others. I’d like to come up with a few of my own algorithms, and try to package them into one of the osC projects I’m working on. I’ll let you know how it goes. If you have any ideas, or experience working with collaborative filtering, let me know.
Friday, August 26, 2005
Thursday, August 4, 2005
Prevent Spider Sessions = TRUE
Continuing out discussion of SEF URLS, another common misconception is that URLs with SIDs (Session Identifications) are not SEF. I often hear people ask how to get rid of the SID in their URL so that search engine spiders will index them, or rank them higher.
The osCommerce application uses cookies to keep track of what customers have placed in their shopping cart. But some customers have their browsers configured for higher security, and by default, do not allow websites to place cookies on their PC. It is for these customers that the SID in the URL is created. For these customers, the SID is passed from URL to URL, rather then begin stored in a cookie, to keep their ‘session’ intact, and so that osC can remember what the custom has in their shopping cart.
The SID in the URL however, is not a methodology without flaws. As you may know, spiders do not allow cookies. Therefore, when they visit your site, they are assigned SIDs through the URL. One of the biggest issues with SIDs in the URL for spiders is that is causes the search engine spiders to go into an ‘Infinite Loop’.
Think of a search engine spider program as an iteration through an array. First, the spider will crawl through the webpage, looking for any URLs it can find. It adds all of the URLs if finds to an array. Then it iterates through the array, visiting each URL one at a time. After visiting all URLs,the seach engin will usually disperce for an indeterminate amount of time, and return, going back through the site again, looking for any new URLs it might have missed the first time around, and adding them to the array. This is where SIDs in the URL trip the search engines up. On the second visit, the spider is assigned a new SID, which is interpreted by the spider as a new URL and therefore added to the array. Since this will happen again and again each time the spider visits your website, the spider never gets to finish iterating through the array. I’ve seen firsthand spiders like ‘Googlebot’ take up uncountable gigabytes of bandwith being stuck in this endless loop.
Another issue with spider getting assigned SIDs through URLs is that indexed sessions can be ‘hi-jacked’. For example, a spider crawls your website, gets assigned SIDs in the URL, and these URLs with SIDs in them actually make it to the search engines index. A Customer finds your listing in the search engine, and clicks on the link with the SID in it. The customer likes your store, and decides to purchase. Then another customer finds your listing in the search engine, and clicks on the link. Because the link has the same SID that the first customer used, osC gets confused, and thinks that the second customer is the first customer, and sometimes can even display sensitive information from the first customer.
So, how is the problem solved? One way to do it would be to enable the ‘force cookie usage’ in the osC admin, and not allow customers to checkout if they do not have cookies enabled. However, not being a very strong advocate of turning my back on any potential customers, in July 2002, I suggested we use a script that looked at the user agent of the visitor as the determinant for whether the SID is added to the URL. ( http://forums.oscommerce.com/index.php?showtopic=31928&hl=security ) This suggestion was adopted nearly unchanged into the core code of osC with the release of osC 2.2 MS2, and is toggled in the admin section with the ‘Allow Search Engine Sessions’ configuration option.
To this day, a better method of preventing spiders from having SIDs assigned through URLs has not been realized, and I recommend that any new store have ‘Prevent Spider Sessions’ set to true, and ‘Force Cookie Usage’ set to false for maximum URL Search Engine Friendliness.
The osCommerce application uses cookies to keep track of what customers have placed in their shopping cart. But some customers have their browsers configured for higher security, and by default, do not allow websites to place cookies on their PC. It is for these customers that the SID in the URL is created. For these customers, the SID is passed from URL to URL, rather then begin stored in a cookie, to keep their ‘session’ intact, and so that osC can remember what the custom has in their shopping cart.
The SID in the URL however, is not a methodology without flaws. As you may know, spiders do not allow cookies. Therefore, when they visit your site, they are assigned SIDs through the URL. One of the biggest issues with SIDs in the URL for spiders is that is causes the search engine spiders to go into an ‘Infinite Loop’.
Think of a search engine spider program as an iteration through an array. First, the spider will crawl through the webpage, looking for any URLs it can find. It adds all of the URLs if finds to an array. Then it iterates through the array, visiting each URL one at a time. After visiting all URLs,the seach engin will usually disperce for an indeterminate amount of time, and return, going back through the site again, looking for any new URLs it might have missed the first time around, and adding them to the array. This is where SIDs in the URL trip the search engines up. On the second visit, the spider is assigned a new SID, which is interpreted by the spider as a new URL and therefore added to the array. Since this will happen again and again each time the spider visits your website, the spider never gets to finish iterating through the array. I’ve seen firsthand spiders like ‘Googlebot’ take up uncountable gigabytes of bandwith being stuck in this endless loop.
Another issue with spider getting assigned SIDs through URLs is that indexed sessions can be ‘hi-jacked’. For example, a spider crawls your website, gets assigned SIDs in the URL, and these URLs with SIDs in them actually make it to the search engines index. A Customer finds your listing in the search engine, and clicks on the link with the SID in it. The customer likes your store, and decides to purchase. Then another customer finds your listing in the search engine, and clicks on the link. Because the link has the same SID that the first customer used, osC gets confused, and thinks that the second customer is the first customer, and sometimes can even display sensitive information from the first customer.
So, how is the problem solved? One way to do it would be to enable the ‘force cookie usage’ in the osC admin, and not allow customers to checkout if they do not have cookies enabled. However, not being a very strong advocate of turning my back on any potential customers, in July 2002, I suggested we use a script that looked at the user agent of the visitor as the determinant for whether the SID is added to the URL. ( http://forums.oscommerce.com/index.php?showtopic=31928&hl=security ) This suggestion was adopted nearly unchanged into the core code of osC with the release of osC 2.2 MS2, and is toggled in the admin section with the ‘Allow Search Engine Sessions’ configuration option.
To this day, a better method of preventing spiders from having SIDs assigned through URLs has not been realized, and I recommend that any new store have ‘Prevent Spider Sessions’ set to true, and ‘Force Cookie Usage’ set to false for maximum URL Search Engine Friendliness.
Subscribe to:
Posts (Atom)
