Last update : 09 february 2011 (home page)
The Elo rating system, developed by Arpad Elo in the early 1960's, is the first rating system based on a probabilistic approach.

A player does not get a subjective amount of points for his/her achievements. In such a system, a guy would score say 120 points for being the Grofaz and another 85 for finishing second at ASLOK whereas a third player would have 80 for winning a local tournament.

A well-known example of this approach is the ATP rating used in Tennis.

Here, the approach is very different. The idea is that a more skilled player will have a higher probability of beating a less skilled one and ratings reflect that probability. Player X's rating would then be representative of how likely it is he or she would beat another player Y who has another given rating. This specific probability of winning is calculated using the formula below:

We is the specific win-expectancy (probability), R1 is the players current rating, and R2 the opponents current rating.

In our previous X against Y example, a value of 0,62 for We would mean X should beat Y 62 times out of a hundred.

If we have players X and Y play each other a hundred times and see that X beats Y 58 times, that could be indicative that their rating difference is a bit too high and we'd rather adjust them. How much ? This is question 2.

Now, one could have them play only 10 times and, having X score 6 wins (instead of 6.2 !), decide to update the ratings accordingly.

One could even have them play just once and, having X win 1 (instead of 0.62 !), decide to update the ratings.

So this is question 1 : how often do we adjust ratings ?

And assuming you do decide to upgrade ratings after a series of wins : how much would you upgrade them ? If the former rating difference corresponded to a 0.62 while the recent series implies a 0.7 : would you ignore the former history and only stick to the recent one ?

So question 2 is : how much weight are we to give to former games and new games ?

Question 1 : How often to update ratings

In ASL, updating after everygame has been a common practice : AH AREA, Tactiques European AREA, ASO ratings and probably WEASL and aslratings.

Some internet sources on the ELO system for chess suggest every 3 to 5 games. On can understand the rationale because just winning or losing a single game doesn't mean enough. After playing the better of 5, 2 player's relative strength becomes clearer.

Crusader, the British ladder, updates ratings and the end of a tournament. It's a clever idea for a typical tournament has 3 to 5 games played on 2 to 3 days. It is also useful when you just don't know the exact order of playings which happens with tournament with loose format (typically Oktoberfest). So, all in all Derek's approach seems a clever one.

Yet, I found that inappropriate for my site because I have to deal with too many atypical situations. For instance, tournaments over internet span for a whole year or so.

So, I decided to update ratings... once a day ! A typical tournament day will see people playing 2 or 3 games, but sometimes it maybe only 1 and sometimes 5 (around october, if you see what I mean).

Question 2 : How much to update ratings

If rating difference between X and Y used to give a We of 62%, based on a series of 40+ games each, and after playing 2 games one see a 100% win of X... does this mean the "correct" rating difference is the one that leads to 100% victories ? Or 67% ? Or what figure between the historical 62 and the marginal 100 ?

The way ELO works, there is a parameter, the K factor, that gives the inertia or resilience of ratings.

Here is the formula:

Rn is the players new rating, Rp is the players previous rating, W is the expectancy based on the new games, We is the previous win-expectancy based on the current ratings difference.

As for K, it is the factor that weights more or less the impact on ratings of the new games results.

A high value would give a lot of importance to newest games, a low one would pretty much leave unchanged the ratings.

The idea is to use different values for K depending on circumstances.

Ideally, one would like evolving people with evolving skill level to have their rating move faster than players in a more mature stage. But how could one know ?

An easy one is to give higher K-factor to new players, which I do : first 10 games count double, so to speak.

Another one, is to believe that players with a high rating achieve it after a large experience and are therefore, probably, at more mature stage. This is done usually for ELO. (see Table).
Rating K-value
0 .. 1800 40
1800 .. 2000 30
2000 .. 2200 20

2200+

10

 

  Another idea is that when somebody quits playing for awhile, his/her rating becomes outdated and one could then give a higher K-factor. This idea is at the root of the Glicko algorithm and I am considering giving it a try one day.