Reputation: 25
Currently trying to calculate a ratio for a dataset that looks something like this:
Country A | Country B | Migrants from A to B
foo bar 123
foo qux 221
bar qux 133
qux foo 312
bar foo 222
This dataset is a pandas dataframe. My objective is to calculate the ratio of migration from a country to another. For example the ratio of migration from 'foo' to 'bar' over 'bar' to 'foo'. In this case it would be 123/222 = 0.55
In addition if possible grouping them together in either a single dataset or multiple subsets, for example, in the following fashion:
Country A | Country B | A to B ratio
foo bar 0.55
bar foo 1.88
foo qux 0.71
qux foo 1.41
.. .. ..
How is this possible to accomplish using pandas, numpy, etc..?
Been trying to group them like so (although I can't even begin to rationalize):
def calcOutToInRatio(self, data):
oiRatio = 0.0
dfMig = data.groupby(['A','B'], as_index=False)['Migration'].mean()
dfMigRev = data.groupby(['B','A'], as_index=False)['Migration'].mean()
dfGrouped = dfMig.loc[(dfMig['A'] == dfMigRev['B']) & (dfMigRev['A'] == dfMig['B'])]
print(dfGrouped)
return oiRatio
Can't event think of possibilities due to my lack of knowledge in pandas operations.. Any advice would be helpful, even if there's an ugly workaround this issue. Thanks!
Upvotes: 0
Views: 91
Reputation: 1126
df.columns = ['A','B','AtoB']
df1 = df.pivot_table('AtoB', index = ['A'], columns = ['B'])
df2 = pd.DataFrame(df1.to_numpy()/df1.to_numpy().T, columns = df1.columns, index = df1.index)
df2
Out[1]:
B bar foo qux
A
bar NaN 1.804878 NaN
foo 0.554054 NaN 0.708333
qux NaN 1.411765 NaN
df2 = df2.reset_index().melt(id_vars = ['A'],value_vars = df1.columns, value_name='ratio')
df2.drop(df2[df2['A']==df2['B']].index)
Out[2]:
A B ratio
1 foo bar 0.554054
2 qux bar NaN
3 bar foo 1.804878
5 qux foo 1.411765
6 bar qux NaN
7 foo qux 0.708333
Upvotes: 1
Reputation: 30920
Use DataFrame.pivot_table
to map with DataFrame.lookup
:
mapper=df.pivot_table(index='CountryA',columns='CountryB',values='MigrantsfromAtoB')
df['ratio']=df['MigrantsfromAtoB']/mapper.lookup(df['CountryB'],df['CountryA'])
print(df)
CountryA CountryB MigrantsfromAtoB ratio
0 foo bar 123 0.554054
1 foo qux 221 0.708333
2 bar qux 133 NaN
3 qux foo 312 1.411765
4 bar foo 222 1.804878
Upvotes: 2
Reputation: 17834
You can merge
the table to itself (self join in SQL):
df.columns = ['Country_A', 'Country_B', 'A_to_B']
df1 = pd.merge(df, df, left_on=['Country_A', 'Country_B'], right_on=['Country_B', 'Country_A'])
df['ratio'] = df1['A_to_B_x'] / df1['A_to_B_y']
Upvotes: 3
Reputation: 430
You can do it by loop but @ansev's answer is more logical:
df = pd.DataFrame({
"A": ["foo", "foo", "bar", "qux", "bar", "qux"], "B": ["bar", "qux", "qux",
"foo", "foo", "bar"], "AtoB": [123, 221, 133, 312, 222, 444]
})
df["duals"] = df["A"] + df["B"]
df["inv_duals"] = df["B"] + df["A"]
for i in range(len(df)):
df.loc[i,"AtoB_Ratio"] = df.loc[i, "AtoB"] / df.loc[df["duals"].isin([df.loc[i, "inv_duals"]]), "AtoB"].item()
df = df.drop(["duals", "inv_duals"], axis=1)
Output:
A B AtoB AtoB_Ratio
0 foo bar 123 0.554054
1 foo qux 221 0.708333
2 bar qux 133 0.299550
3 qux foo 312 1.411765
4 bar foo 222 1.804878
5 qux bar 444 3.338346
Upvotes: 1