This is another “Grab Bag” video featuring Google Webspam head Matt Cutts. Here he talks about the ability of the crawler to make inferences when going from page to page in response to a question from a webmaster in the UK. The inquiry focused on URLs, as in whether the GoogleBot can guess at the existence of a third page if it was able to crawl through Page 1 and Page 2, these being obviously named in the address.

Matt cannot be certain that this exact kind of inference is taking place, but he does state for a fact that the GoogleBot makes a number of similar conjectures. The particular example given is parameter reduction. There are many extremely long URLs on the Web due to the addition of non-essential parameters such as user IDs on referral links. Google tries to drop sections of a web address to see if the same page will be returned. A positive result means that the particular section can be eliminated. This process of trial and error goes on until the crawler is left with the shortest possible URL for a page. This prevents duplicates and makes the address easier to memorize for end-users.

Video Link:
Does Googlebot use inference when crawling? –