Poisson Processes, Chapter 3¶

by John Kingman¶

PoissonPokemon Processes, Chapter 3 1¶

by John KingmanJohto Kingler¶

Outline¶

1. The Combat Power Multiplier (CPM)Monotonic Nonlinearities!¶

2. Combat PowerNavigating Pareto Frontiers!¶

3. Catching PokemonGeometric Series, Monte Carlo Simulations, and Poisson Processes!¶

1. Core Monotonic Nonlinearity: The CPM¶

We will start off both of the subsequent sections with this quantity!

In [89]:

import pandas as pangoro
cpm = pangoro.read_csv('cp_multiplier.txt', delimiter="\t")
cpm['$CPM^2$'] = cpm['CP Multiplier']**2
cpm['$CPM$'] = cpm['CP Multiplier']
plt.figure(figsize=(12,2.5))
cpm.plot('Level', '$CPM$', ax=plt.gca(), lw=3)
cpm.plot('Level', '$CPM^2$', ax=plt.gca(), lw=3)
plt.axvline(30, ls='--', color='black', alpha=0.5)
plt.xlabel('Pokémon Level')
plt.ylabel('CP Multiplier Quantity')
plt.title('The Combat Power Multiplier (CPM)')
plt.ylim(0, 1); plt.xlim(1, 40)
plt.xticks([1,5,10,15,20,25,30,35,40]); plt.show()

2. Pareto Frontiers in Pokémon Combat¶

2.1 How CPM is used in calculating Combat Power (CP)¶

$$ \begin{split} Atk &= (BaseAtk + IndividualAtk) * CPM^2 \\ Def &= (BaseDef + IndividualDef) * CPM^2 \\ Sta &= (BaseSta + IndividualSta) * CPM^2 \\ CP &= \mathtt{np.floor}\left(\frac{1}{10}\left(Atk * \sqrt{Def} * \sqrt{Sta}\right)\right) \end{split} $$

Let's visualize some of these statistics!

In [93]:

# Loading the dataset

dex = pangoro.read_json('pokedex.json')
dex = dex[~dex.name.str.contains('mega')]
dex = dex[~dex.name.str.contains('primal')]
dex = dex[~dex.name.str.contains('forme')]
dex = dex[~dex.name.str.contains(' ')]
dex['baseSta'] = dex.baseStm

dex['max_cp_at_lvl40'] = np.floor(
    0.1 *
    (dex.baseAtk+15) *
    np.sqrt(dex.baseDef+15) * 
    np.sqrt(dex.baseStm+15) * 
    cpm['CP Multiplier'].max()**2).astype(int)

dex['min_cp_at_lvl40'] = np.floor(
    0.1 *
    (dex.baseAtk+1) *
    np.sqrt(dex.baseDef+1) * 
    np.sqrt(dex.baseStm+1) * 
    cpm['CP Multiplier'].max()**2).astype(int)

max_cp = {}
for name, cp in dex[['name', 'max_cp_at_lvl40']].values:
  max_cp[name] = cp

dex['abs_cp_diff'] = dex['max_cp_at_lvl40'] - dex['min_cp_at_lvl40']
dex['pct_cp_diff'] = (dex['max_cp_at_lvl40'] - dex['min_cp_at_lvl40']) / dex['max_cp_at_lvl40']

2.2 Visualizing the Pidgeyotto Frogadier¶

2.2.1 Stats vs. other stats¶

In [103]:

featured_pokemon = ['chansey', 'blissey', 'lugia', 'slaking',
                    'shedinja', 'garchomp', 'kingler', 'cresselia',
                    'mewtwo', 'anorith', 'shuckle', 'salamence']

def compare(attr1, attr2):
  plt.xlabel(attr1)
  plt.ylabel(attr2)
  plt.scatter(dex[attr1], dex[attr2], edgecolors='black', alpha=0.75)
  on_front = get_pareto_frontier(np.array([dex[attr1], dex[attr2]]).T)
  on_front = on_front[on_front[:,0].argsort()]
  plt.plot(on_front[:,0], on_front[:,1], color='green', lw=3, alpha=0.5)
  for fp in featured_pokemon:
    x = dex[dex.name==fp][attr1].values[0]
    y = dex[dex.name==fp][attr2].values[0]
    plt.scatter(x, y, color='red', alpha=0.5, s=100)
    txt = plt.text(x, y, fp, ha='center', fontsize=12, fontweight='bold')
    txt.set_path_effects([PathEffects.withStroke(linewidth=2, foreground='w')])
    
plt.figure(figsize=(18,6))
plt.subplot(131, title='Atk vs Def'); compare('baseAtk', 'baseDef')
plt.subplot(132, title='Atk vs Sta'); compare('baseAtk', 'baseSta')
plt.subplot(133, title='Def vs Sta'); compare('baseDef', 'baseSta')
plt.tight_layout(); plt.show()

2.2.2 Pokémon with lowest max CP¶

In [95]:

dex.sort_values('max_cp_at_lvl40')[['name', 'min_cp_at_lvl40', 'max_cp_at_lvl40', 'baseAtk', 'baseDef', 'baseSta']][:10]

Out[95]:

	name	min_cp_at_lvl40	max_cp_at_lvl40	baseAtk	baseDef	baseSta
455	magikarp	161	274	29	85	85
235	feebas	161	274	29	85	85
42	azurill	230	364	36	71	137
329	happiny	215	371	25	77	225
776	sunkern	265	395	55	55	102
395	kricketot	266	401	45	74	114
715	shuckle	207	405	17	396	85
739	smeargle	284	431	40	83	146
377	kakuna	290	432	46	75	128
141	cosmog	293	432	54	57	125

2.2.3 Pokémon with highest max CP¶

In [96]:

dex.sort_values('max_cp_at_lvl40')[['name', 'min_cp_at_lvl40', 'max_cp_at_lvl40', 'baseAtk', 'baseDef', 'baseSta']][-10:][::-1]

Out[96]:

	name	min_cp_at_lvl40	max_cp_at_lvl40	baseAtk	baseDef	baseSta
730	slaking	4796	5308	319	183	312
642	regigigas	4686	5182	315	231	242
498	mewtwo	4502	4995	330	200	235
399	kyogre	4432	4914	297	250	225
316	groudon	4432	4914	297	250	225
908	zekrom	4342	4820	302	232	225
647	reshiram	4342	4820	302	232	225
178	dialga	4342	4820	302	232	225
747	solgaleo	4341	4818	280	210	289
440	lunala	4341	4818	280	210	289

2.2.4 Pokémon with highest relative difference in max/min CP¶

In [97]:

dex.sort_values('pct_cp_diff')[['name', 'min_cp_at_lvl40', 'max_cp_at_lvl40', 'baseAtk', 'baseDef', 'baseSta']][-10:][::-1]

Out[97]:

	name	min_cp_at_lvl40	max_cp_at_lvl40	baseAtk	baseDef	baseSta
715	shuckle	207	405	17	396	85
329	happiny	215	371	25	77	225
235	feebas	161	274	29	85	85
455	magikarp	161	274	29	85	85
42	azurill	230	364	36	71	137
474	marill	302	461	37	93	172
739	smeargle	284	431	40	83	146
395	kricketot	266	401	45	74	114
776	sunkern	265	395	55	55	102
377	kakuna	290	432	46	75	128

2.3 Max CP isn't the full story¶

In [106]:

from scipy.stats import pearsonr

tdo = pangoro.read_csv('tdo.csv')
tdo['NAME'] = [name.lower() for name in tdo['NAME']]
max_off_tdo = tdo.groupby('NAME')['OFF TDO'].max()
max_def_tdo = tdo.groupby('NAME')['DEF TDO'].max()
max_dps = tdo.groupby('NAME')['DPS'].max()

pokemon = set(dex['name']).intersection(set(tdo['NAME']))
fig = plt.figure(figsize=(18,6))
for i, (max_tdo, tdo_type) in enumerate([(max_off_tdo, 'Offensive TDO'), (max_def_tdo, 'Defensive TDO'), (max_dps, 'DPS')]):
  cps = [max_cp[n] for n in pokemon]
  tdos = [max_tdo[n] for n in pokemon]
  
  on_front = get_pareto_frontier(np.array([cps, tdos]).T)
  on_front = on_front[on_front[:,0].argsort()]
  corr = pearsonr(cps, tdos)[0]
  plt.subplot(1,3,i+1, title=tdo_type+ ' vs Max CP ($\\rho$={:.2f})'.format(corr))
  plt.scatter(cps, tdos, edgecolors='black')
  plt.plot(on_front[:,0], on_front[:,1], color='green', lw=3, alpha=0.75)
  for n in featured_pokemon:
    plt.scatter(max_cp[n], max_tdo[n], color='red', alpha=0.5, s=100)
    txt = plt.text(max_cp[n], max_tdo[n], n, ha='center', fontsize=12, fontweight='bold')
    txt.set_path_effects([PathEffects.withStroke(linewidth=2, foreground='w')])
  plt.xlabel('Max CP')
  plt.ylabel(tdo_type)

plt.tight_layout()

3 Towards Pokémon Poisson Processes¶

3.1 How CPM is used in calculating Catch Probability¶

$$ \begin{split} P(ThrowFails) &= \left(1 - \frac{BaseCatchRate}{2 * CPM}\right)^{Ball∗Curve∗Berry∗Throw∗Medal} \\ Ball &= \begin{cases} 2.0 & \text{ if } Ultra Ball \\ 1.5 & \text{ if } Great Ball \\ 1.0 & \text{ otherwise } \\ \end{cases} \\ Curve &= \begin{cases} 1.7 & \text{ if } Curveball \\ 1.0 & \text{ otherwise } \end{cases} \\ Berry &= \begin{cases} 2.5 & \text{ if } Golden Razz \\ 1.5 & \text{ if } Razz Berry, Silver Pinap \\ 1.0 & \text{ otherwise } \end{cases} \\ Throw &= \begin{cases} 2.0 - InnerRadius & \text{ if } WithinRadius \\ 1.0 & \text{ otherwise } \end{cases} \\ Medal &= \begin{cases} 1.3 & \text{ if } Gold \\ 1.2 & \text{ if } Silver \\ 1.1 & \text{ if } Bronze \\ 1.0 & \text{ otherwise } \end{cases} \\ \end{split} $$

Experimental data and p-values can been viewed at gamepress.

3.2 Catch probability over multiple throws¶

Note that rigorous experimental evidence has established that flee rate has no dependence on CP:

$$ P(PokemonFlees) = BaseFleeRate $$

Letting

$$\begin{split} P &\equiv 1 - P(ThrowFails),\\ F &\equiv BaseFleeRate, \end{split}$$ we can therefore write $$\begin{split} P(CatchEventually) &= P + P(1-P)(1-F) + P(1-P)^2(1-F)^2 + \cdots\\ &= P \sum_{i=0}^\infty ((1-P)(1-F))^i \\ &= P \left(\frac{1}{1 - ((1-P)(1-F))}\right) \\ &= \frac{P}{P+F-PF} \end{split}$$

Of course, $P$ is actually a random variable, because its value depends on the success of the throw. So this is not quite accurate. How can we analyze this?

3.3 Mankey luCario Simulation¶

The main issue is that we often miss the inner circle, which creates significant variance in the value of $P$ for each throw. We can simulate this process by assuming a Gaussian noise model in the $(x,y)$ location of our throw. Let us further assume that we always attempt to shoot a curveball when the circle is at one-quarter radius, but that sometimes we get our timing a little off (again with a Gaussian error).

This lets us write down the following generative process:

$$ \begin{split} Throw_x, Throw_y &\sim \mathcal{N}(0, \sigma^2_{xy}) \\ InnerRadius &\sim \mathcal{N}(0.25, \sigma^2_{t}) \\ WithinRadius &= \mathcal{I}\left[\sqrt{Throw_x^2 + Throw_y^2} \leq InnerRadius\right]\\ Throw &= WithinRadius * (2 - InnerRadius) + (1 - WithinRadius) \end{split} $$

which we can plug back in to our original formula for $P$.

In [99]:

sigma_xy = 0.2   # our accuracy is pretty good
sigma_t = 0.2     # our timing might be off by 0.2
F = 0.1           # 10% flee rate
base_miss = 0.6   # 60% miss rate before throw applied

def sample_xyr():
  x = np.random.randn() * sigma_xy
  y = np.random.randn() * sigma_xy
  r = 0.25 + np.random.randn() * sigma_t
  if r > 1: r = r-1
  if r < 0: r = 1-abs(r)
  return x,y,r
 
def sample_throw():
  x,y,r = sample_xyr()
  return 2 - r if np.sqrt(x**2 + y**2) <= r else 1
  
def sample_P():
  return 1-np.power(base_miss, sample_throw())
  
def sample_catch():
  while True:
    caught = (np.random.rand() < sample_P())
    if caught: return 1
    fled = np.random.rand() < F
    if fled: return 0
    
xyrs = [sample_xyr() for _ in range(1000)]
throws = [sample_throw() for _ in range(1000)]
probs = [sample_P() for _ in range(1000)]
catches = [sample_catch() for _ in range(1000)]

In [100]:

plt.figure(figsize=(16,4.25))
plt.subplot(141, title='Throws & Radii')
plt.scatter([x for x,y,r in xyrs], [y for x,y,r in xyrs], alpha=0.5, edgecolors='black')
plt.gca().add_artist(plt.Circle((0,0), 0.25, color='black', fill=False, ls='--', lw=3))
for x,y,r in xyrs[:100]:
  plt.gca().add_artist(plt.Circle((0,0), r, color=(0,0,0,0.1), fill=False, ls='--'))
plt.xlim(-1,1); plt.ylim(-1,1)
plt.subplot(142, title='Throw Multipliers')
plt.hist(throws, bins=15)
plt.subplot(143, title='Catch Probabilities')
plt.hist(probs, bins=8)
plt.xlim(0,1)
plt.subplot(144, title='Success/Failure')
plt.hist(catches, bins=2)
plt.xticks([0.25, 0.75], ['Pokémon\nFled', 'Eventually\nCaught'])
plt.tight_layout()
plt.show()

3.4 But Wait! What about Pokemon Attacks?¶

One thing we didn't factor into our Monte Carlo simulation is whether the Pokemon executes a Jump or Attack animation. When this happens, our throw fails.

Now, a failed throw in this case doesn't actually cause wild Pokémon to flee. However, this is a significant issue for Raid Bosses, because in those cases, we have a limited supply of Poké Balls.

It turns out that the time between Pokémon attacks is well-modeled by a Geometric distribution, the discrete analogue of an Exponential:

hey (source: gamepress)

3.5 But doesn't that mean...¶

Yes! That means Pokémon attacks are well-modeled by a Poisson process!

3.6 Live Poisson-Gamma Experiment!¶

Alright folks! We are going to attempt to model the rate at which different wild Pokémon have attack or jump animations.

Let's:

wait for a particular wild Pokémon to appear
all tap on that Pokémon to enter the catch screen
set a timer
count the number of times the Pokémon jumps or attacks (track them separately)

Afterwards, we'll count up the number of jumps and attacks we observe in a fixed time interval (and subtract the length of the animations) to obtain data about the rate. Assuming a simple Gamma prior, we can update it to get a posterior estimate of the rate parameter (whose true value I know).

In [101]:

results = [
  (6, 7), # Weiwei
  (7, 6), # Javier
  (4, 3), # Ike
  (6, 7), # Melanie
  (5, 7), # Joe
  (3, 5)  # Andrew
]

# Assume prior of 1/6
prior_count = 1
prior_time = 6

posterior_count = (prior_count + sum(a+j for j,a in results))
posterior_time = (prior_time + sum([60-2*a-1*j for j,a in results]))

Statistics works!!¶

In [102]:

shape, rate = posterior_count, posterior_time
def pdf(x):
  return (rate**shape/gamma(shape)) * x**(shape-1) * np.exp(-1 * rate * x)

x = np.linspace(0, 1, 100)
y = [pdf(x_i) for x_i in x]
plt.title('Experimental results')
for i,(j,a) in enumerate(results):
  plt.axvline((a+j) / (60-2*a-1*j), alpha=0.25, ls='--', label=(None if i else 'Experimental data'))
plt.plot(x,y, lw=3, label='Posterior estimate')
plt.axvline(.2 / .77, color='black', ls='--', label='True value')
plt.legend(loc='best')
plt.xlabel('Possible value of rate parameter')
plt.ylabel('Probability density')
plt.show()

Conclusion: Pokémon Stop¶

Let's get back to work >_<