Programming Sets Aggregator, Archetype/Core Statistics, and Teambuilder Sorter from Pokemon Showdown

vapicuno · May 5, 2019

Hello everyone,

This project was born out of a conversation with Sadlysius, who felt that it would be nice to have a sets compendium for ADV OU. Since then, I've worked on extending this project to analyzing full teambuilders. In the illustrations, I use the team dump https://pastebin.com/bWP6S1hv from thelinearcurve (who has kindly let me make this analysis public) in this RoA post as examples.

I hope this helps some of you to understand your own teambuilding tendencies, make the building process simpler, and give your friends and teammates some idea of your style. Without further ado,

Universal Features - A Short Prelude

This is a program that takes in a pokemon showdown builder and spits out a bunch of things.

Both standard human-friendly format and the condensed format that showdown uses for backup with too many teams are allowed (but not both at the same time, so mix and match builders together in showdown before downloading)
The code processes teams from all metagames simultaneously, but categorizes results by metagame, so there is no need to select teams by metagame
An option to exclude unfinished teams is provided. Upon the user's choice, the sets compiler, builder stats and sorter can be unaffected by the presence of unfinished teams.

Sets Compiler and Statistics

Interested in making a set list for performing calcs in tours? Or perhaps reusing some sets for your new teams? Compiling sets is relatively easy; categorizing sets into something meaningful and intuitive. In this update, I've added the following feature:

1. Combining sets based on EVs/IVs/Up to two moveslots
2. Displaying set usage stats

If you're a ADV OU player you probably have a ridiculous number of Tyranitar variants, but when we talk about it, there are only a few generic themes -- setup sweeper, bulky 4atk, choice band, bulky pursuit, fast mix lead. Most of these functions are differentiated by item, significantly different EV spreads, and usually up to two key moves. Filler moves and small EV changes like speed creeps and where you put your 4 in a 252/252 build aren't as critical in a teambuilding decision.

EV similarities are determined by a threshold -- if you need to move less than some number of EVs to go from one set to another set, it's considered the same by the sets compiler. This is up to you to tweak, but there are some rules of thumb you can follow. For example, in ADV OU, most pokemon have base speeds in multiples of 5, so speed stats are frequently separated by 10's. Suicune is at 206, and Milotic is at 196. Thus, a speed creep of more than 10 points, or 40 EVs allows you to access another speed tier and can be considered qualitatively different. You can then make threshold a little smaller than 40 EVs to account for small variations in creep. The same thing is implemented for IVs.

That may still leave a lot of extraneous sets if you're a real wacky builder. As a last resort, I have also included an option to cut out the least used fraction of sets from the sets compendium produced at the end.

The result of processing thelinearcurve's builder is in this paste: https://pokepast.es/bc214af8b0a1e92a

Here's an illustrative example of set aggregation: Choice Band Metagross sets are extracted from the builder and condensed into one, where the filler moves Rock Slide/Double-Edge/HP Bug are placed in descending order of usage. The stats below the set indicate that together, these make up 18 Choice Band Metagross sets, or 26.5% of the total of 68 Metagross appearances. Rock Slide appears in 12 of those sets, or 17.6%.

In the following example, HP Ice is used more frequently than HP Grass, and also Roar more than Baton Pass. Thus, four different Zapdos sets are all grouped into a single set with slashed moves. However, I do not combine sets with two moves that appear more likely in pairs. For example, in ADV OU, we see Toxic and Protect on Magneton to make use of poison damage, or Thunder Wave and Substitute to fish paralysis. Toxic and Substitute, or Thunder Wave and Protect are comparatively rare. When moves come in pairs, I find it helpful to separate them into different sets to make it clear.

Builder Statistics

Would you like your friends and teammates to understand your style better so that they can offer suggestions for matchups and improvement? Would you like to understand theirs? Or if you're a kind soul like thelinearcurve, would you like to share stats with the community for discussion? We often talk about usage stats and viability rankings in terms of pokemon, but there's a lot that can be said about combinations as well. In this project, I have taken the liberty of generating builder statistics for cores and archetypes. In particular, I introduce the concepts of Synergy and Confidence, which are different from merely Frequency. The results of this code can be seen in my extensive analysis of ADV OU by clicking here. I quote verbatim the philosophy and methodology:

What is a Core, and what is an Archetype?

A core is a group of mons that function well together. In other words, they have synergy. An archetype is a group of teams that encompass a similar gameplan. Archetypes are frequently thought of as defined by a few key mons that are supplemented by a variety of cores that can be substituted with each other. Likewise, a core may belong to more than one archetype.

Methodology

The technical details of the work are in the spoilers below. Check them out if you want to know how the sets are categorized, how the synergies are measured, and how the archetypes are determined.

Categorization and Naming of Sets

Perhaps the most tricky thing about this project is the categorization of sets. There are more unique sets than teams in the builders to work with, so I ruled out using unsupervised clustering, meaning I have to actually think about how to split the sets instead of letting the computer do it. I have to thank Disaster Area and Zokuru for many ideas and discussions we've had on finding a good methodology.

The benchmark for a good categorization tool is Tyranitar. It has so many sets: Physical 4-attack, Choice Band, Dragon Dance, Pursuit, Mix Lead. Within DD variants, there are HP Grass or HP Bug fillers, and there are defensive and offensive EV spreads. A good methodology should distinguish all these sets from each other, and be able to name them appropriately.

I started with the item, and not just any item, but Choice Band (1). It is the single most easy-to-spot and accurate predictor of the set's function. Taking CBtar out of the equation, I noticed that all the sets above have unique EV priorities. Physical 4-attack is HP/Atk invested, DD is either Atk/Spe or HP/Spe, Pursuit is HP/SpA, Mix Lead is SpA/Spe. I thus created categories based on the top two EVs of the set (2).

To look for divisions within categories, I use the concept of synergy again -- not between mons, but between moves. If two moves occupy one slot, say HP Grass/HP Bug in the filler slot of DDtar, then we expect them not to appear together in the same set; in other words, the moves antisynergize (3). Of course, categories will be really cluttered if even the rarest sets were split this way (at this moment I do not treat it as a priority to account for the rare Substitute DDtar), so I only included moves above a probability and count threshold (4). I try to find pairs first that satisfy these conditions, if not then triplets. These four criterion labelled (1)-(4) determine the categories.

Naming categories is easy when they are split by moves or items, but what about those determined by EVs? I find that most of these sets are still tied to a unique move that mostly does not appear, or at most appears only once on the majority of the other sets. For example, Dragon Dance only exists on Atk/Spe or HP/Spe Tyranitar sets, so we could say that DD almost uniquely characterizes these sets.

Measuring Synergy

It's not so much the frequency of the core, but how much more frequently the core appears together than if the constituents were just put together by chance. Official Smogon statistics already does this in calculating teammates. I'll take a slightly different approach based off the concept of Multivariate Pointwise Mutual Information (wiki on PMI, wiki on MMI, paper combining both) in natural language processing, that can be extended beyond pairs into triplet or quad cores. In short, this is a number that is positive when there is synergy (prefer to be teammates), negative when there is antisynergy (prefer not to be teammate), and zero when the mons appear independently.

Suppose I want to measure how synergistic the quintessential Magneton + Claydol core is. If Magneton and Claydol were independent, then the pair would appear roughly with probability equal to the product of individual probabilties, P(Magneton) * P(Claydol). A synergistic pair should exceed this probability, i.e. P(Magneton, Claydol) > P(Magneton) * P(Claydol). An antisynergistic pair, like Forretress and Skarmory, where you would tend to only use one as your spiker, should give a lower than expected probability, i.e. P(Forretress, Skarmory) < P(Forretress) * P(Skarmory). It thus makes sense for the synergy of a pokemon pair (X, Y) to by defined by P(X, Y) / [P(X) * P(Y)], which means the number of times over just pure chance that this pair to appear together. This number is 1 when the pair is independent. To get a number that has more sensible properties like being 0 when independent and (negative) positive when (anti) synergetic, we use the logarithm to obtain the final synergy score = log2{P(X, Y) / [P(X) * P(Y)]}. Similar formulas can be derived for triplet cores, that compare the core probability with constituent pairs and individual mons.

For those who don't want to understand the math, a score of +1/+2/+3 means 2/4/8 times more frequent than expected from combining individual (and lower order) probabilities. Similarly a score of -1/-2/-3 means 2/4/8 times less frequent than expected.

Identifying Archetypes

This is clearly a clustering problem, and the defining feature of this problem is that mons can appear in more than one archetype. Therefore, it makes sense to provide a score that determines the confidence that each mon is in each archetype, so this is the realm of fuzzy clustering algorithms. If we now imagine every set as a node on a network, with the strength of linkages determined by their usage frequency, then the process of finding clusters is intuitively understood: How can we cut the network into some number of completely separate partitions while making sure that we've on average cut only weak links? It turns out that an algorithm called spectral clustering does just that.

I formed an adjacency matrix representing a graph where nodes are the mon sets, and the weights on the edges are the probability that the two nodes co-appear. Then, I used normalized spectral clustering described by Ng, Jordan and Weiss (2002). For the clustering step, I used fuzzy C-means clustering instead of the usual K-means. I determined the number of clusters by finding the point of a sharp fall-off in a plot of the fuzzy partition coefficient vs number of clusters, and tuning the exponent so that the clusters in the data visually had high representation and tightness.

Core Rankings - Synergy and Frequency

Cores can either be strongly or weakly synergistic (given by the synergy score), and they can be frequent or infrequent (given by the frequency label). The final rankings that you see in the files labelled “_synergy_sets_statistics” are a combination of both – they are determined by a weighted product with an exponent. Weighting synergy too much may cause very specific cores of BL mons to appear at the top e.g. Sun teams, while, weighting frequency gives us not much more information than single mon frequencies.

Archetypes - Confidence and Frequency

The files labelled “_archetype_statistics” are the results of an attempt to categorize synergies into broad archetypes. These archetypes are completely blind to human input, meaning I did not ask for specific criteria to be met such as requiring Tyranitar in TSS. Each archetype is labelled with a number and is tagged to a ranked list, and the rankings are again determined by two factors. This time, one of the indicators is the level of confidence that the set is in the archetype. This is a number from 0 to 1, and can be thought of in percentages where the higher the number, the more likely this mon is in the archetype. The other indicator is frequency in the entire builder (not in the archetype!). Just as I did with the cores, the final rankings are a combination of both, a weighted product. Those at the top are most likely to be in the core.

Importantly, this yields archetype statistics (in csv and txt format) that tell you independently how frequently the mons appear and how likely they are to be in a particular archetype.

Code:

Archetype 3
Counts | Freq (%) | Confidence | Pokemon
     44 |   18.257 |       0.59 |             HP/SpD Whirlwind Skarmory
     21 |    8.714 |       0.74 |                     HP/Def ST Blissey
     14 |    5.809 |       0.84 |                 Def/SpA TW SB Blissey
     21 |    8.714 |       0.59 |          HP/Def Surf Protect Swampert
     16 |    6.639 |       0.71 |                HP/Atk FP RS Tyranitar
     16 |    6.639 |       0.62 |           HP/Atk Explosion RS Claydol
     23 |    9.544 |       0.36 |              Def/SpA Toxic SB Blissey
     44 |   18.257 |       0.21 |              SpA/Spe HP Fire Magneton
     12 |    4.979 |       0.47 |               HP/Spe Taunt WOW Gengar
     18 |    7.469 |       0.34 |                               Moltres
      5 |    2.075 |       0.81 |               Def/SpA Wish SB Blissey
     12 |    4.979 |       0.40 |                    SpA/Spe HP Starmie

Code:

Archetype 4
Counts | Freq (%) | Confidence | Pokemon
     27 |   11.203 |       0.96 |                        Band Metagross
     19 |    7.884 |       0.97 |         Atk/Spe HP Grass DD Tyranitar
     21 |    8.714 |       0.89 |                  SpA/Spe FB Salamence
     22 |    9.129 |       0.75 |           SpA/Spe Substitute Swampert
     19 |    7.884 |       0.64 |   SpA/Spe HP Grass Thunderbolt Zapdos
     21 |    8.714 |       0.57 |              HP/SpA Psychic Metagross
     12 |    4.979 |       0.77 |                  Atk/SpD DD Salamence
     11 |    4.564 |       0.66 |           Atk/Spe HP Bug DD Tyranitar
     17 |    7.054 |       0.41 |                 Atk/SpD FP SD Snorlax
      6 |    2.490 |       0.84 |                              Ludicolo
      6 |    2.490 |       0.72 |              SpA/Spe HP Grass Jirachi
      5 |    2.075 |       0.81 |                               Breloom

Also, here are core statistics that tell you how separately how frequently the mons appear and how synergistic they are.

Code:

2-Cores Arranged by Frequency
Counts | Freq (%) | Synergy | Cores
     15 |    6.224 |    2.88 |                       HP/SpA Magneton,                               Milotic
     14 |    5.809 |    2.19 |                            Forretress,               HP/SpA Crunch Tyranitar
     10 |    4.149 |    2.84 |             SpA/Spe HP Grass Magneton,         Atk/SpD Earthquake SD Snorlax
     10 |    4.149 |    2.49 |                        Band Metagross,         Atk/Spe HP Grass DD Tyranitar
     15 |    6.224 |    1.29 |                       Band Aerodactyl,               HP/SpA Crunch Tyranitar
     15 |    6.224 |    1.11 |                  HP/SpD Roar Skarmory,               HP/SpA Crunch Tyranitar
     10 |    4.149 |    1.95 |              SpA/Spe HP Fire Magneton,                        Band Salamence
     11 |    4.564 |    1.60 |                   SpA/Spe FP Swampert,     SpA/Spe HP Ice Thunderbolt Zapdos

Code:

3-Cores Arranged by Frequency
Counts | Freq (%) | Synergy | Cores
      7 |    2.905 |    1.91 |        HP/SpD HP Grass Recover Celebi,                       HP/SpA Magneton,                               Milotic
      4 |    1.660 |    2.06 |                      HP/Def LS Celebi,                       HP/SpA Magneton,                               Milotic
      2 |    0.830 |    4.38 |              HP/SpD SD Recover Celebi,         Atk/Spe RS Earthquake Claydol,             HP/Spe HP Grass Metagross
      4 |    1.660 |    1.54 |                  Atk/SpD DD Salamence,             SpA/Spe Endeavor Swampert,         Atk/Spe HP Grass DD Tyranitar

The results were quite astonishingly intuitive, and I used them to analyze the ADV OU metagame in this analysis -> https://www.smogon.com/forums/threa...pes-and-cores-a-data-driven-approach.3654874/ . I encourage you to read it to get a feel for how powerful these statistics can be in understanding the metagame.

Builder Sorter

With all these stats, we can finally get down to sorting. I realized speaking to people that everyone has their own way of sorting teams, so this may not be very relevant, but it can be good if you're starting out a tier and trying to make your way from using standard to unconventional stuff sorting by core. If you have a habit of naming your teams but not sorting them, this is a way to get them into alphabetical order. Or, you could be someone who plays a huge number of tiers and desperately want your builder to look neat. I'm very open to suggestions on sorting -- please let me know either in this thread or PM!

At the moment, I follow the following sorting hierarchy (with options to enable/disable):

Unsorted Unfinished teams
Finished teams
- Sort across metagame (alphabetical, usage)
  - Sort across folders (alphabetical, usage)
    - Sort across teams (lead usage)
      - Sort across teams (alphabetical, n-core usage)
        
        Sort within team (usage, color)

Here are some results! It's really clear that standard teams go on top and weird stuff goes at the bottom. Perhaps the most mysterious thing is what sorting by n-core really means. I'm not observant enough to find qualitative differences in sorting by the number of mons in a core, but maybe one of you can tell me.

Sorted by 3-core usage then by mon frequency -- Left = top of list, Right = bottom of list. Full builder here: https://pokepast.es/3f07acf85ad4ae84

Sorted by lead then 2-core usage then color

Some cool rainbow color sorts:

Testing the Code on Other Generations

I chanced upon a team dump web scraping bot here and tested the code on it. As the bot is not perfect, I sanitized the team by throwing it into showdown and pulling it out again to get the packaged format, and then doing some filtering of my own to remove nonsense entries. Here's the paste after doing that. I ran the code on that paste and got the results in the images below. There seems to be some consistency in the gen6ou sort order and stats. I wasn't able to do the gen7ou format because there were way too many teams for my RAM to handle. Note: the code should only take a few seconds to run, so if becomes very slow it's a sign that you don't have enough RAM (I had about 2GB available for this program, which was enough to handle the gen6ou one but not gen7ou), so you might have to split the builder up by generation or folder.

Installation and Usage Details

The code is written in Python 3 (without requiring numpy unlike previous), so you can just pull from github here and run it from the command line if you already have Python and are familiar with git. If you've used some command line and just want to install a lean python, go for the first spoiler. If you really don't want to use the command line because you aren't familiar with computer stuff, open the second spoiler. It takes up some space on your computer but is should be easy to use.

1. Download the latest version of Python 3 from https://www.python.org/downloads/ and read step 2 before installing!
2. Open the installation file and Add Python to PATH by clicking the red check box

3. Download the repository (whole folder) from here by clicking clone/download -> download zip. Place your builder in the same folder as main.py
4. Read the README from here and edit main.py using a text editor of your liking to configure your settings, and save main.py . Further installations are needed from the README to do archetype analysis.
5. Open the command line (windows: search for cmd in start menu, mac: applications > utilities > terminal)
6. Type cd "directory_of_main.py" (if main.py is C:/Users/User/My Documents/BuilderAnalyzer/main.py, type cd "C:/Users/User/My Documents/BuilderAnalyzer"
6a. If you're on a mac, you might need to enter chmod +x main.py
7. Type python main.py and press enter
8. Watch new files pop up in the same directory and profit!

Edit: Foolproof Installation instructions for Python:
1. Download the Python 3.x (not Python 2.x) version of Anaconda from here
2. Download the repository (whole folder) from here by clicking clone/download -> download zip
3. Place your builder in the same folder as main.py
4. Search your start menu for Jupyter Notebook and run it. A GUI local directory should open in your web browser
5. Navigate to the folder containing main.py
6. At the top right hand corner, click New -> Notebooks -> Python
7. Copy and paste the code from main.py (open it in notepad or something) into the notebook
8. Read the README from here to configure your sort settings. Further installations are needed from the README to do archetype analysis.
9. Shift+Enter to run the code.
10. Watch new files pop up in the same directory and profit!

Closing Thoughts

I'm pretty open to suggestions on changes and additional features as I think they can benefit a lot of people. Please leave your comments in the thread, and let me know of any bugs should you try it out! Shoutout to Altina, Sadlysius, Watchog and especially xJoelituh for taking time to help with testing!

vapicuno · May 5, 2019

I've updated the above post with instructions for installing the relevant Python packages to run the code, for people who aren't well versed in running Python scripts. It'll stay there until I find a way to easily make this available on a server or decide to turn it into an executable.

vapicuno · Jun 17, 2019

Major updates: I've now added the ability to aggregate similar sets together, provide core statistics of the teambuilder, and sort the teambuilder (see original post!)

vapicuno · Sep 29, 2019

Major updates added to the OP under Builder Statistics:
1. The program now calculates synergies (which is not just based on usage but how much more frequently the mons appear together than independently) and uses them to identify synergistic cores.
2. Archetype analysis (clustering) has now been added. The program can identify archetypes from the teams it is given.

The fruits of this work are most easily seen in a very extensive analysis that I have done on the ADV OU metagame ->https://www.smogon.com/forums/threa...pes-and-cores-a-data-driven-approach.3654874/ . I encourage you to read it just to get a feel for what these statistics can do.

I'll be happy to help if any of you want to run this code to do your own metagame analysis. As a precaution, I've not tested it on other gens because I don't play them, but it would be interesting nevertheless to try it out.

wyc2333 · Sep 29, 2019

instead of set aggregator, an updated set dump may be more attractive

Rezzo · Oct 21, 2021

For anybody who still may want to run this, I've uploaded an updated 'main.py' here that'll download the Pokédex from the new Smogon GitHub location, just copy and paste the text into your local file while removing all of the old script.

In your 'BuilderAnalyzer' folder, you must also change these file extensions to the following:
abilities.js -> abilities.ts
items.js -> items.ts
moves.js -> moves.ts
pokedex.js -> pokedex.ts

vapicuno please feel free to grab the code and upload it, or if you just want the snippet with the updated URLs, it's lines 105-126.

Very powerful tool, by the way!

Programming Sets Aggregator, Archetype/Core Statistics, and Teambuilder Sorter from Pokemon Showdown

vapicuno

你的价值比自己想象中的所有还要低。我却早已解脱，享受幸福

Attachments

vapicuno

你的价值比自己想象中的所有还要低。我却早已解脱，享受幸福

vapicuno

你的价值比自己想象中的所有还要低。我却早已解脱，享受幸福

vapicuno

你的价值比自己想象中的所有还要低。我却早已解脱，享受幸福

wyc2333

A=X+Y+Z Y: Hard Work

Rezzo

(EVIOLITE COMPATIBLE)

Users Who Are Viewing This Thread (Users: 1, Guests: 0)