Suggested changes to Rating system in our new Smogon ladder

Status
Not open for further replies.

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
chaos has asked about suggesting a rating system for our Smogon ladder, and here are my suggestions.

Basically, I propose to use the glicko2 system, which is exactly the same as the one implemented in the Shoddy ladder, with a few modifications. Assuming that R is the mean rating, RD is the rating deviation and v is the volatility of a player, the changes I suggest are the following:
  1. The Rating displayed to the player is just round(R), not R - 4*RD as is used on Shoddy.
  2. The Rating of a player is not always shown, however. It is only shown if RD<100, otherwise the Rating of the player is provisional. This way, a new player would need to play between 20 and 25 games for his or her rating to become visible. This should hopefully deter players from creating multiple accounts.
  3. RD cannot drop below the threshold value of 60. If the RD of a player becomes less than 60, it becomes equal to 60. This allows for the rating of a frequently-competing player to continue to change at a nice pace instead of very slowly, which should help players keep playing with their current account.
  4. RD cannot go above the threshold value of 350. If it becomes greater than 350, it becomes 350. This is a very minor change, done to make a player's rating deviation be at least that of a beginning player even if the player stops playing completely.
  5. If a player does not battle in a particular day, phi (which is equal to RD / 173.7178) becomes equal to sqrt((phi^2) + 4*(v^2)) instead of sqrt((phi^2) + (v^2)) as is currently implemented (and then the new RD becomes the new phi * 173.7178). This change makes a frequently-competing player's rating go provisional after about 14 consecutive days of inacitivity, which should deter players from occupying the top of the ladder for a long time without playing. It also has the effect of making a player's rating become as uncertain as that of a beginning player after about 9 months of inactivity (which means that if you don't play for 9 straight months, the ladder would consider you a noob even if you were #1 before stopping playing.)
I'd like to have some comments from players that participate on the ladder to see if the above points address what they believe are shortcomings of the Shoddy ladder, and points for further improvement.
 

Ancien Régime

washed gay RSE player
is a Top Team Rater Alumnusis a Battle Simulator Moderator Alumnus
Obviously we talked about it on #insidescoop, but I agree 100% with these changes. I hated having to make new nicks and alts because my progress on the ladder was basically halted after a certain amount of time. I feel that a rating system that rewards (or at least doesn't punish consistency) is the best way to go.
 

Great Sage

Banned deucer.
I also agree with all of these changes. The only part I have a slight objection to is the bolded part of number 5; 10 days is a bit short, IMO.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Okay, I'll make it 14 days. It's a pretty simple fix; I just need to replace the '6' in the formula with '4'. :) As a result, the time taken to return to an RD of 350 is now 9 months, not 6 months.

Just wanted to ask something. The Shoddy page says that the ladder system tries to match you with a player having conservative rating estimate (CRE) close to yours. The CRE is the infamous R - 4 x RD used by Colin to represent a rating. Since we're going to just use R to represent a player's rating, that part of the program should be fixed to make the ladder system search for the Rating R that's close to yours, not the CRE.
 

Aeolus

Bag
is a Top Tutor Alumnusis a Tournament Director Alumnusis a Site Content Manager Alumnusis a Battle Simulator Admin Alumnusis a Smogon Discord Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis an Administrator Alumnusis a Top Dedicated Tournament Host Alumnus
looks great to me. Another thing I like about this is that comparing ratings on our server to player ratings on Official Server will not possible.
 
sweet. this might get me laddering again. i hated it when my rating on shoddy got to like 1600 and never increased which made me quit shoddying :(
 

obi

formerly david stone
is a Site Content Manager Alumnusis a Programmer Alumnusis a Senior Staff Member Alumnusis a Smogon Discord Contributor Alumnusis a Researcher Alumnusis a Top Contributor Alumnusis a Battle Simulator Moderator Alumnus
Unfortunately, I am unaware of just how much someone would have to play to get their RD below 100, so it's possible that rule means this first part isn't an issue.

The reason Shoddy uses the 4*RD part is that because Glicko doesn't attempt to give you a single rating, but rather, a range of values. Displaying just R is saying that the player has a 50% chance to have an actual skill level at or above that value. For new players, their rating range is rather large because Glicko isn't quite sure just where they are. When Colin looked at the list when sorted by R, nearly every player at the "top" was someone he and I had never heard of. Subtracting four deviations is saying "This player has a 99%+ chance of having this rating or higher." which has the effect of only including more certain players.

As for rule change 5, that really gets to the heart of what the purpose of the ladder is. If the purpose is to create an environment in which people are trying to get to the top and then have to fight to maintain it, then yes, having more "rating decay" is good. If the purpose of the ladder is to rank players in terms of their skill, then the "rating decay" should be roughly equal to the loss of skill over time (so much, much lower than on the Official Server).

As far as I can tell, in combination with what you proposed in 1. (the use of R over anything involving RD), this will give no "rating decay", so the only issue is keeping yourself from becoming provisional.

looks great to me. Another thing I like about this is that comparing ratings on our server to player ratings on Official Server will not possible.
How is this a good thing?
 

Ancien Régime

washed gay RSE player
is a Top Team Rater Alumnusis a Battle Simulator Moderator Alumnus
Unneccesary arguing, in the sense of "well my ratings better on official/my rating's better on Smogon" or even "Official/Shoddy has better players", which I'm not sure we want to get into.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
Unfortunately, I am unaware of just how much someone would have to play to get their RD below 100, so it's possible that rule means this first part isn't an issue.
It takes roughly 20 to 25 battles for your RD to become below 100.

The reason Shoddy uses the 4*RD part is that because Glicko doesn't attempt to give you a single rating, but rather, a range of values. Displaying just R is saying that the player has a 50% chance to have an actual skill level at or above that value. For new players, their rating range is rather large because Glicko isn't quite sure just where they are. When Colin looked at the list when sorted by R, nearly every player at the "top" was someone he and I had never heard of. Subtracting four deviations is saying "This player has a 99%+ chance of having this rating or higher." which has the effect of only including more certain players.
I know this, and this is why I'm making all ratings having RD 100 or more provisional. If RD is that large, the rating isn't reliable, but is extremely uncertain; hence, it's provisional. And yeah, I looked into that list that Colin made. All of those players that came up at the top that you 'did not know' would have had a provisional rating in this new system, so they would actually not appear at all (or appear at the bottom as 'provisional').

Here is an old list that Colin has posted to prove his point that R - 4 x RD is the way to go. I added the RD at the end of each player's list:

Code:
+-----------------+------------------+------------------+------+--------+
| name            | mean             | cre              | rank |   RD   | 
+-----------------+------------------+------------------+------+--------+
| Riptor          | 2001.38839416590 | 1062.40892193233 | 3005 | 234.74 |
| TAY             | 1986.88929284469 | 1539.31163028140 |   67 | 111.89 |
| pokeboy         | 1936.96546149701 | 1286.74050683289 |  936 | 162.56 |
| Cruel           | 1922.73953480444 | 1260.29075970672 | 1135 | 165.61 |
| Dietrich        | 1922.46008104249 | 1474.02080852881 |  157 | 112.11 |
| Astrohawke      | 1912.12921847407 | 1506.23953086525 |  113 | 101.47 |
[B]| goofball        | 1909.65642964082 | 1687.27542634793 |    2 |  55.60 |[/B]
[B]| depom           | 1905.66564656608 | 1680.62135901767 |    3 |  56.26 [/B]|
| Ultimatehero124 | 1904.45073688254 | 1065.74445853246 | 2966 | 209.68 |
| cfickle         | 1903.27813162474 | 1207.76685711231 | 1549 | 173.88 |
| icepick         | 1892.06731203119 | 965.700557231403 | 3932 | 231.59 |
| Cerberus.       | 1885.06702338470 | 1242.31276758645 | 1268 | 160.69 |
[B]| jrrrrrrr        | 1884.51791441567 | 1631.38600598089 |   14 |  63.28 |[/B]
| KingGarchomp    | 1878.47795320330 |  733.72568498120 | 5734 | 286.19 |
[B]| Slice-T_A       | 1878.02889092308 | 1624.60981701875 |   17 |  63.35 |[/B]
[B]| goofballSKY     | 1873.45047086468 | 1640.56885806744 |   12 |  58.22 |[/B]
[B]| goofballANGRY   | 1870.11539263519 | 1620.81293485045 |   19 |  62.33 |[/B]
| chansey_slayer  | 1857.58419165324 | 972.744433035372 | 3864 | 221.21 |
| Swordzman       | 1856.33698216188 | 858.235378512444 | 4815 | 249.53 |
| Infernape       | 1856.04245327237 | 672.297485107073 | 6170 | 295.94 |
+-----------------+------------------+------------------+------+--------+
In this new system, the only players out of the above that would be listed on the ladder are the ones in bold. They would be listed as #1, #2, #3, etc. All the other players would have provisional ratings.

As for rule change 5, that really gets to the heart of what the purpose of the ladder is. If the purpose is to create an environment in which people are trying to get to the top and then have to fight to maintain it, then yes, having more "rating decay" is good. If the purpose of the ladder is to rank players in terms of their skill, then the "rating decay" should be roughly equal to the loss of skill over time (so much, much lower than on the Official Server).
There would be no rating decay in this system, and that's why I made the RD increase faster in this system. One could obtain a top 10 ranking and then stop playing, looking at his rating up there. With RD increasing faster, he would have 14 days for his rating to drop to provisional (and only if his RD is 60; if it is less, it would take him even less to become provisional).

As far as I can tell, in combination with what you proposed in 1. (the use of R over anything involving RD), this will give no "rating decay", so the only issue is keeping yourself from becoming provisional.
Exactly. And that's why I made the rating go provisional quicker than normal. I actually made it to go to provisional in one week at first, then I made it 10 days. Then people suggested to make it drop to provisional in 14 days and I fixed it that way.
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
I did a simulation on Excel using the proposed rating system. Interestingly, the volatility increases dramatically when the people playing each other have their mean rating very far from each other. This happens in Shoddybattle because the player it finds to play against you is the one that has the nearer CRE to you, not the nearer rating. I tested by playing two games yesterday (with a crap team) to confirm this.

By making the player that plays against you have nearer mean rating (and, if possible, close RD as well), the volatility was better.

I don't know if this is possible to implement, but I'd suggest that the opponent that is proposed for playing against you on the ladder is one that has similar mean rating and similar RD to yours, not similar CRE.
 

chaos

is a Site Content Manageris a Battle Simulator Administratoris a Programmeris a Smogon Discord Contributoris a Contributor to Smogonis an Administratoris a Tournament Director Alumnusis a Researcher Alumnus
Owner
im thinking its best just to drop the ladder changes. too many people are freaking out, and until the bug is fixed in the shoddy client we cant do anything about it
 

X-Act

np: Biffy Clyro - Shock Shock
is a Site Content Manager Alumnusis a Programmer Alumnusis a Smogon Discord Contributor Alumnusis a Top Researcher Alumnusis a Top CAP Contributor Alumnusis a Top Tiering Contributor Alumnusis a Top Contributor Alumnusis a Smogon Media Contributor Alumnusis an Administrator Alumnus
That's okay. I'll continue to research on this so that hopefully Competitor will implement it.
 
Status
Not open for further replies.

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top