Last update : 09 february 2011 (home page)
The Elo rating system, developed by Arpad Elo in the early
1960's, is the first rating system based on a probabilistic approach.
A player does not get a subjective amount of points for his/her achievements.
In such a system, a guy would score say 120 points for being the Grofaz and another 85 for finishing
second at ASLOK whereas a third player would have 80 for winning a local tournament.
A well-known example of this approach is the ATP rating used in Tennis.
Here, the approach is very different.
The idea is that a more skilled player will have a higher probability of beating a less skilled one and ratings reflect that probability.
Player X's rating would then be representative of how likely it is he or she would beat another player Y who has another given rating.
This specific probability of winning is calculated using the formula below:
We
is the specific win-expectancy (probability), R1
is the players current rating, and R2
the opponents current rating.
In our previous X against Y example, a value of 0,62 for We would mean X should beat Y 62 times out of a hundred.
If we have players X and Y play each other a hundred times and see that X beats Y 58 times, that could be indicative that their rating difference is a bit too high and we'd rather adjust them. How much ? This is question 2.
Now, one could have them play only 10 times and, having X score 6 wins (instead of 6.2 !), decide to update the ratings accordingly.
One could even have them play just once and, having X win 1 (instead of 0.62 !), decide to update the ratings.
So this is question 1 : how often do we adjust ratings ?
And assuming you do decide to upgrade ratings after a series of wins : how much would you upgrade them ?
If the former rating difference corresponded to a 0.62 while the recent series implies a 0.7 : would you ignore the former history and only stick to the recent one ?
So question 2 is : how much weight are we to give to former games and new games ?
Question 1 : How often to update ratings
In ASL, updating after everygame has been a common practice : AH AREA, Tactiques European AREA, ASO ratings and probably WEASL and aslratings.
Some internet sources on the ELO system for chess suggest every 3 to 5 games. On can understand the rationale because just winning or losing a single game doesn't mean enough. After playing the better of 5, 2 player's relative strength becomes clearer.
Crusader, the British ladder, updates ratings and the end of a tournament. It's a clever idea for a typical tournament has 3 to 5 games played on 2 to 3 days.
It is also useful when you just don't know the exact order of playings which happens with tournament with loose format (typically Oktoberfest).
So, all in all Derek's approach seems a clever one.
Yet, I found that inappropriate for my site because I have to deal with too many atypical situations. For instance, tournaments over internet span for a whole year or so.
So, I decided to update ratings... once a day !
A typical tournament day will see people playing 2 or 3 games, but sometimes it maybe only 1 and sometimes 5 (around october, if you see what I mean).
Question 2 : How much to update ratings
If rating difference between X and Y used to give a
We
of 62%, based on a series of 40+ games each, and after playing 2 games one see a 100% win of X... does this mean the "correct" rating difference is the one that leads to 100% victories ? Or 67% ? Or what figure between the historical 62 and the marginal 100 ?
The way ELO works, there is a parameter, the K factor, that gives the inertia or resilience of ratings.
Here is the formula:
Rn
is the players new rating, Rp
is the players previous rating, W
is the expectancy based on the new games, We
is the previous win-expectancy based on the current ratings difference.
As for K, it is the factor that weights more or less the impact on ratings of the new games results.
A high value would give a lot of importance to newest games, a low one would pretty much leave unchanged the ratings.
The idea is to use different values for K depending on circumstances.
Ideally, one would like evolving people with evolving skill level to have their rating move faster than players in a more mature stage. But how could one know ?
An easy one is to give higher K-factor to new players, which I do : first 10 games count double, so to speak.
Another one, is to believe that players with a high rating achieve it after a large experience and are therefore, probably, at more mature stage. This is done usually for ELO. (see Table).
| Rating |
K-value |
| 0 .. 1800 |
40 |
| 1800 .. 2000 |
30 |
| 2000 .. 2200 |
20 |
|
2200+
|
10 |
|
|
Another idea is that when somebody quits playing for awhile, his/her rating becomes outdated and one could then give a higher K-factor. This idea is at the root of
the Glicko algorithm and I am considering giving it a try one day.