While reading this section, please bear in mind that it contains theoretical information rather than practical guidelines.
The following three items comprise the main idea of the LocalRank algorithm:
1. An algorithm is used to select a certain number of documents relevant to the search query (let it be N). These documents are initially sorted by some criteria (this may be PageRank, relevance or a group of other criteria). Let us call the numeric value of this criterion OldScore.
2. Each of the N N selected pages goes through a new ranking procedure and it gets a new rank. Let us call it LocalScore.
3. The OldScore and LocalScore values for each page are multiplied, to yield a new value – NewScore. The pages are finally ranked based on NewScore.
The key procedure in this algorithm is the new ranking procedure, which gives each page a new LocalScore rank. Let us examine this new procedure in more detail:
0. An initial ranking algorithm is used to select N pages relevant to the search query. Each of the N pages is allocated an OldScore value by this algorithm. The new ranking algorithm only needs to work on these N selected pages. .
1. While calculating LocalScore for each page, the system selects those pages from N that have inbound links to this page. Let this number be M. At the same time, any other pages from the same host (as determined by IP address) and pages that are mirrors of the given page will be excluded from M.
2. The set M is divided into subsets Li. These subsets contain pages grouped according to the following criteria:
- Belonging to one (or similar) hosts. Thus, pages whose first three octets in their IP addresses are the same will get into one group. This means that pages whose IP addresses belong to the range xxx.xxx.xxx.0 to xxx.xxx.xxx.255 will be considered as belonging to one group.
- Pages that have the same or similar content (mirrors)
- Pages on the same site (domain).
3. Each page in each Li subset has rank OldScore. One page with the largest OldScore rank is taken from each subset, the rest of pages are excluded from the analysis. Thus, we get some subset of pages K referring to this page.
4. Pages in the subset K are sorted by the OldScore parameter, then only the first k pages (k is some predefined number) are left in the subset K. The rest of the pages are excluded from the analysis.
5. LocalScore is calculated in this step. The OldScore parameters are combined together for the rest of k pages. This can be shown with the help of the following formula:
After LocalScore is calculated for each page from the set N, NewScore values are calculated and pages are re-sorted according to the new criteria. The following formula is used to calculate NewScore:
NewScore(i)= (a+LocalScore(i)/MaxLS)*(b+OldScore(i)/MaxOS)
i is the page for which the new rank is calculated.
a and b – are numeric constants (there is no more detailed information in the patent about these parameters).
MaxLS – is the maximum LocalScore among those calculated.
MaxOS – is the maximum value among OldScore values.
Now let us put the math aside and explain these steps in plain words.
In step 0) pages relevant to the query are selected. Algorithms that do not take into account the link text are used for this. For example, relevance and overall link popularity are used. We now have a set of OldScore values. OldScore is the rating of each page based on relevance, overall link popularity and other factors.
In step 1) pages with inbound links to the page of interest are selected from the group obtained in step 0). The group is whittled down by removing mirror and other sites in steps 2), 3) and 4) so that we are left with a set of genuinely unique sites that all share a common theme with the page that is under analysis. By analyzing inbound links from pages in this group (ignoring all other pages on the Internet), we get the local (thematic) link popularity.
LocalScore values are then calculated in step 5). LocalScore is the rating of a page among the set of pages that are related by topic. Finally, pages are rated and ranked using a combination of LocalScore and OldScore.
No comments:
Post a Comment