Partially Approved Get core datas + winrates (to improve Teambuilding)

Hi everyone,

Would it be possible to have Smogon ladder datas containing all the different cores used in the ladder (in any tier) + their winrates? (Only for 1760-1825+ ELO would be enough imo.)
I (and many other people) confirm that it would be incredibly useful in Teambuilding. In practice, I mean having datas exactly like that, but for the current ladder (1760/1825+ ELO). Would it be possible?

If it's not due to:
- Code missing + Lack of coders, I can code this if you give me access to the PS API as I am a developer
- Performance/Server reasons, please DM me, I could help by maybe providing that

Tks a lot,

Nico
 
I know but if this improvement would be like 5 minutes to write/or is just a copy paste from tournaments, I'd prefer that this would be made by a PS/Smogon developer instead of me which whould have to spent hours to perfectly understand how this code works and try to code something that's a huge waste of time, you know :]
So that's why I first ask if it can be done by any PS/Smogon dev before (if it's not too long or too huge to do), and me as a last resort.
 

Zokuru

The Stall Lord
is a Tiering Contributor
Hey !

That would be super cool to have that kind of information on a larger scale than we have, those informations exist for hogh level tournament but the amount of games on the ladder is HUGE and that could be really useful to have all this.

pre Sorry to taf you randomly like that but I showed that to Vapicuno and he told me you already thought a bit about this, so maybe you could tell us a bit on how doable this is, or anything you want to say about this topic.
 
Pinging DoW since I know he's been working on a rewrite of the program that generates monthly usage stats so this might be something included or to include.

On my end I have a slightly improved version of the tool used for tournaments stats which includes stats for cores of more than 2 pokemons too and handles pokemons with different forms like silvally (useful for lower tiers). However it relies on either battle logs or replay links so only someone with access to ladder logs would be able to run it.
 
Last edited:
  • Like
Reactions: DoW

DoW

formally Death on Wings
Yes, I decided to build a postgres database to handle the PS! stats, with the idea of replacing the current flat files / python scripts situation, building on Antar's ideas (see https://pokemetrics.wordpress.com/ and https://github.com/Antar1011/Onix, although mine was entirely separate and built from the ground-up).

My motivation for this was primarily because I wanted to look at more detailed stats than are currently available, such as those the OP asks for, as well as the idea that giving communities and TLs access to more detail might help improve both the competitive metagame and tiering decisions. It would certainly include everything being asked for here.

I PM'd Antar about this, who liked the idea of this plan actually being implemented, and then Zarel, who said "Pulling stats is annoyingly hard. I would probably want to rearchitect that entire thing first.", which I took as a "go for it".

This happened in 2018, and I proceeded to put together a database that would be able to handle everything the current scripts can do, and then some python scripts that would pull the data from the DB equivalent to what the current python scripts can do.

However, for the project to really be successful, I think what's needed is an integration into PS! code to automatically update the DB with every game, and possibly even handle Elo for the server. I'm not particularly familiar with the PS! codebase though, and the parts I'd need to work on are the parts I'm least familiar with, and I've never worked on Typescript before, so this progress has been extremely slow. In short, I haven't committed to my git repo since before quarantine, due to work being very busy as well as my, uh, getting distracted a bit.

I very much intend to speak to Marty about this project when I have free time again and, assuming he doesn't have any issues, continue work on this. I was nearly at the point where I could run a test PS! server, although I suspect the minute I look at my code I'll decide I need to rewrite about half of it. But that won't be done this year, at least.

If anyone wants to help me in developing this, feel free to PM me. FSR I decided to put this project on bitbucket rather than github, but I'll happily give access to anyone who wants to help work on it, and I can take a look and divert attention to cleaning up rough edges / committing code I had sitting in my local repo if that's actually going to happen.
 

pre

pkmn.cc
Would it be possible to have Smogon ladder datas containing all the different cores used in the ladder (in any tier) + their winrates? (Only for 1760-1825+ ELO would be enough imo.)
I (and many other people) confirm that it would be incredibly useful in Teambuilding. In practice, I mean having datas exactly like that, but for the current ladder (1760/1825+ ELO). Would it be possible?

If it's not due to:
- Code missing + Lack of coders, I can code this if you give me access to the PS API as I am a developer
- Performance/Server reasons, please DM me, I could help by maybe providing that

Nico
Two comments on this specific suggestion:
  • Win rates are strictly inferior to weighted statistics (except perhaps being more understandable to the layman). You don't have ratings in tournaments which are why they're used there, you do have ratings with ladder stats.
  • You can perform arbitrarily complex statistics (like around cores) on a small number of logs, but this does not scale at all to the amount of data we are talking about (pairwise is perhaps feasible, going beyond pairs is pretty much certainly intractable to manage with reasonable runtime and memory)
On usage statistics in general:

Antar's scripts are riddled with bugs and take 24-48 hours to run to completion over a full month of logs (Marty is just a workhorse and runs them frequently on subsets of the logs so that at the end of the month he can get stats up quickly). I have rewritten them in Typescript to be bug-for-bug compatible and got them to run in ~15-20 minutes over the same data: https://github.com/pkmn/stats. We have not transitioned to using this system to replace the stats dumps yet (there is this TODO list, but the stats revamp will also come along with other things, like a new output format, API endpoints for fetching stats on specific mons, a visualizer UI etc) - the main goal so far has been to fix its performance and make it maintainable (and in the process, fix its bugs). We have also discussed using a database, but that is not a blocker here (and comes with a bevy of its own concerns).

Until the architecture rewrite (linked above) is finished, attempting to add anything to usage stats is a kind of a doomed mission. Once the rewrite has landed, people should feel very welcome to open PRs / provide feature suggestions, but this is one area of development which is certainly going to be less approachable to developers unless they have a strong understanding of statistics, computational complexity, cardinality, scaling systems, resource constraints, etc. Quite simply, something that provides detailed information on a handful of MB of date does not necessarily scale to the TBs of data we have, and the majority of the challenge with stats is less about figuring out how to compute something, but how to compute something at scale.

If you are at all interested in contributing to the usage stats work in any way or simply want to watch for progress in this space, please join https://spo.ink/dev (#other-dev is the place to discuss this) or feel free to DM me on Discord.
 

Users Who Are Viewing This Thread (Users: 1, Guests: 0)

Top