Link dataframe columns on another dataframe, under condition

Question

Following the questions: Python Pandas Only Compare Identically Labeled DataFrame Objects & How to create a new data frame based on conditions from another data frame.

I am trying to create a column in an already existing dataframe df1 with values conditioned on another already existing dataframe df2 with different index and columns.

In df2 ['Home'] cities have a ['Home_ranking']. I need to match the ['Home_ranking'] in df1 with those of df2, the condition being that ['Home'] cities should match. For instance, city B is the first ['Home'] city in df1, I need to get it's ranking in df2, which is 2. So that the first ['Home_ranking'] in df1 is 2 and so forth.

df1 has 50 rows and more columns and df2 only has 20 rows and 4 columns.

>>> print(df1)
      Home     Away     Home_ranking    Away_ranking    [other columns..]
0   city B   city C                0               0
1   city D   city D                0               0
2   city B   city D                0               0
3   city A   city A                0               0

>>> print(df2)
      Home     Away     Home_ranking    Away_ranking
0   city A   city B               12               5
1   city D   city C                4               7
2   city F   city A                1               9
3   city B   city D                2               8

Expected result

>>> print(df1)
      Home     Away     Home_ranking    Away_ranking
0   city B   city C                2               7
1   city D   city D                4               8
2   city B   city D                2               8
3   city A   city A               12               9

I have tried many things suggested in the previous questions, but none of the seem to work for me. I have also tried:

for i in df1.Home :
    if df1.Home == df2.Home :
       df1['Home_ranking'][i] = df2.Home_ranking[i]

yields "ValueError: Can only compare identically-labeled Series objects"

df1['Home_ranking'][i] = df2.loc[(df2['Home'] == df1['Home'][i])].Home_ranking

yields "KeyError: 'city B' I am lost, any help welcome.

tdy · Accepted Answer

You can map() the rankings:

For Home_rankings, the Home column is the matching key, so set df2's index to Home and map its Home_ranking to df1['Home'].
For Away_rankings, the key is Away, so set df2's index to Away and map its Away_ranking to df1['Away'].

df1['Home_ranking'] = df1['Home'].map(
    df2.drop_duplicates(subset=['Home']).set_index('Home')['Home_ranking'])

df1['Away_ranking'] = df1['Away'].map(
    df2.drop_duplicates(subset=['Away']).set_index('Away')['Away_ranking'])

#      Home    Away  Home_ranking  Away_ranking
# 0  city B  city C             2             7
# 1  city D  city D             4             8
# 2  city B  city D             2             8
# 3  city A  city A            12             9

For this use case, mapping should be faster (and scale better) than merging:

%%timeit
df1['Home'].map(df2.drop_duplicates(subset=['Home']).set_index('Home')['Home_ranking'])
df1['Away'].map(df2.drop_duplicates(subset=['Away']).set_index('Away')['Away_ranking'])

# 2.92 ms ± 72.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
(df1.drop(columns=['Home_ranking', 'Away_ranking'])
    .merge(df2[['Home', 'Home_ranking']], how='left', on='Home')
    .merge(df2[['Away', 'Away_ranking']], how='left', on='Away')

# 4.59 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Link dataframe columns on another dataframe, under condition

Answers (2)

Related Questions