The Google company was the first company to patent the system of taking into account inbound links. The algorithm was named PageRank. In this section, we will describe this algorithm and how it can influence search result ranking.
PageRank is estimated separately for each web page and is determined by the PageRank (citation) of other pages referring to it. It is a kind of “virtuous circle.” The main task is to find the criterion that determines page importance. In the case of PageRank, it is the possible frequency of visits to a page.
I shall now describe how user’s behavior when following links to surf the network is modeled. It is assumed that the user starts viewing sites from some random page. Then he or she follows links to other web resources. There is always a possibility that the user may leave a site without following any outbound link and start viewing documents from a random page. The PageRank algorithm estimates the probability of this event as 0.15 at each step. The probability that our user continues surfing by following one of the links available on the current page is therefore 0.85, assuming that all links are equal in this case. If he or she continues surfing indefinitely, popular pages will be visited many more times than the less popular pages.
The PageRank of a specified web page is thus defined as the probability that a user may visit the web page. It follows that, the sum of probabilities for all existing web pages is exactly one because the user is assumed to be visiting at least one Internet page at any given moment.
Since it is not always convenient to work with these probabilities the PageRank can be mathematically transformed into a more easily understood number for viewing. For instance, we are used to seeing a PageRank number between zero and ten on the Google Toolbar.
According to the ranking model described above:
- Each page on the Net (even if there are no inbound links to it) initially has a PageRank greater than zero, although it will be very small. There is a tiny chance that a user may accidentally navigate to it.
- Each page that has outbound links distributes part of its PageRank to the referenced page. The PageRank contributed to these linked-to pages is inversely proportional to the total number of links on the linked-from page – the more links it has, the lower the PageRank allocated to each linked-to page.
- PageRank A “damping factor” is applied to this process so that the total distributed page rank is reduced by 15%. This is equivalent to the probability, described above, that the user will not visit any of the linked-to pages but will navigate to an unrelated website.
Let us now see how this PageRank process might influence the process of ranking search results. We say “might” because the pure PageRank algorithm just described has not been used in the Google algorithm for quite a while now. We will discuss a more current and sophisticated version shortly. There is nothing difficult about the PageRank influence – after the search engine finds a number of relevant documents (using internal text criteria), they can be sorted according to the PageRank since it would be logical to suppose that a document having a larger number of high-quality inbound links contains the most valuable information.
Thus, the PageRank algorithm "pushes up" those documents that are most popular outside the search engine as well.
No comments:
Post a Comment