tegraze
tegraze

Reputation: 25

Pandas Dataframe Groupby where column A value == Column B value

Currently trying to calculate a ratio for a dataset that looks something like this:

Country A | Country B | Migrants from A to B
   foo         bar           123
   foo         qux           221
   bar         qux           133
   qux         foo           312
   bar         foo           222    

This dataset is a pandas dataframe. My objective is to calculate the ratio of migration from a country to another. For example the ratio of migration from 'foo' to 'bar' over 'bar' to 'foo'. In this case it would be 123/222 = 0.55

In addition if possible grouping them together in either a single dataset or multiple subsets, for example, in the following fashion:

Country A | Country B |  A to B ratio
   foo         bar           0.55
   bar         foo           1.88
   foo         qux           0.71
   qux         foo           1.41
   ..           ..            ..  

How is this possible to accomplish using pandas, numpy, etc..?

Been trying to group them like so (although I can't even begin to rationalize):

    def calcOutToInRatio(self, data):
        oiRatio = 0.0
        dfMig = data.groupby(['A','B'], as_index=False)['Migration'].mean()
        dfMigRev = data.groupby(['B','A'], as_index=False)['Migration'].mean()
        dfGrouped = dfMig.loc[(dfMig['A'] == dfMigRev['B']) & (dfMigRev['A'] == dfMig['B'])]
        print(dfGrouped)
        return oiRatio

Can't event think of possibilities due to my lack of knowledge in pandas operations.. Any advice would be helpful, even if there's an ugly workaround this issue. Thanks!

Upvotes: 0

Views: 91

Answers (4)

Alex
Alex

Reputation: 1126

df.columns = ['A','B','AtoB']

df1 = df.pivot_table('AtoB', index = ['A'], columns = ['B'])
df2 = pd.DataFrame(df1.to_numpy()/df1.to_numpy().T, columns = df1.columns, index = df1.index)
df2

Out[1]:
  B      bar       foo          qux
  A         
bar     NaN        1.804878     NaN
foo     0.554054   NaN          0.708333
qux     NaN        1.411765     NaN


df2 = df2.reset_index().melt(id_vars = ['A'],value_vars = df1.columns,  value_name='ratio')
df2.drop(df2[df2['A']==df2['B']].index)

Out[2]:
    A   B   ratio
1   foo bar 0.554054
2   qux bar NaN
3   bar foo 1.804878
5   qux foo 1.411765
6   bar qux NaN
7   foo qux 0.708333

Upvotes: 1

ansev
ansev

Reputation: 30920

Use DataFrame.pivot_table to map with DataFrame.lookup:

mapper=df.pivot_table(index='CountryA',columns='CountryB',values='MigrantsfromAtoB')
df['ratio']=df['MigrantsfromAtoB']/mapper.lookup(df['CountryB'],df['CountryA'])
print(df)
  CountryA CountryB  MigrantsfromAtoB     ratio
0      foo      bar               123  0.554054
1      foo      qux               221  0.708333
2      bar      qux               133       NaN
3      qux      foo               312  1.411765
4      bar      foo               222  1.804878

Upvotes: 2

Mykola Zotko
Mykola Zotko

Reputation: 17834

You can merge the table to itself (self join in SQL):

df.columns = ['Country_A', 'Country_B', 'A_to_B']

df1 = pd.merge(df, df, left_on=['Country_A', 'Country_B'], right_on=['Country_B', 'Country_A'])
df['ratio'] = df1['A_to_B_x'] / df1['A_to_B_y']

Upvotes: 3

Iron Hand Odin
Iron Hand Odin

Reputation: 430

You can do it by loop but @ansev's answer is more logical:

df = pd.DataFrame({
"A": ["foo", "foo", "bar", "qux", "bar", "qux"], "B": ["bar", "qux", "qux", 
"foo", "foo", "bar"], "AtoB": [123, 221, 133, 312, 222, 444]
})

df["duals"] = df["A"] + df["B"]
df["inv_duals"] = df["B"] + df["A"]

for i in range(len(df)):
    df.loc[i,"AtoB_Ratio"] = df.loc[i, "AtoB"] / df.loc[df["duals"].isin([df.loc[i, "inv_duals"]]), "AtoB"].item()


df = df.drop(["duals", "inv_duals"], axis=1)

Output:

A   B   AtoB    AtoB_Ratio
0   foo bar 123 0.554054
1   foo qux 221 0.708333
2   bar qux 133 0.299550
3   qux foo 312 1.411765
4   bar foo 222 1.804878
5   qux bar 444 3.338346

Upvotes: 1

Related Questions