Reputation: 59
Following the questions: Python Pandas Only Compare Identically Labeled DataFrame Objects & How to create a new data frame based on conditions from another data frame.
I am trying to create a column in an already existing dataframe df1 with values conditioned on another already existing dataframe df2 with different index and columns.
In df2 ['Home'] cities have a ['Home_ranking']. I need to match the ['Home_ranking'] in df1 with those of df2, the condition being that ['Home'] cities should match. For instance, city B is the first ['Home'] city in df1, I need to get it's ranking in df2, which is 2. So that the first ['Home_ranking'] in df1 is 2 and so forth.
df1 has 50 rows and more columns and df2 only has 20 rows and 4 columns.
>>> print(df1)
Home Away Home_ranking Away_ranking [other columns..]
0 city B city C 0 0
1 city D city D 0 0
2 city B city D 0 0
3 city A city A 0 0
>>> print(df2)
Home Away Home_ranking Away_ranking
0 city A city B 12 5
1 city D city C 4 7
2 city F city A 1 9
3 city B city D 2 8
Expected result
>>> print(df1)
Home Away Home_ranking Away_ranking
0 city B city C 2 7
1 city D city D 4 8
2 city B city D 2 8
3 city A city A 12 9
I have tried many things suggested in the previous questions, but none of the seem to work for me. I have also tried:
for i in df1.Home :
if df1.Home == df2.Home :
df1['Home_ranking'][i] = df2.Home_ranking[i]
yields "ValueError: Can only compare identically-labeled Series objects"
df1['Home_ranking'][i] = df2.loc[(df2['Home'] == df1['Home'][i])].Home_ranking
yields "KeyError: 'city B' I am lost, any help welcome.
Upvotes: 1
Views: 1451
Reputation: 41327
You can map()
the rankings:
Home_rankings
, the Home
column is the matching key, so set df2
's index to Home
and map its Home_ranking
to df1['Home']
.Away_rankings
, the key is Away
, so set df2
's index to Away
and map its Away_ranking
to df1['Away']
.df1['Home_ranking'] = df1['Home'].map(
df2.drop_duplicates(subset=['Home']).set_index('Home')['Home_ranking'])
df1['Away_ranking'] = df1['Away'].map(
df2.drop_duplicates(subset=['Away']).set_index('Away')['Away_ranking'])
# Home Away Home_ranking Away_ranking
# 0 city B city C 2 7
# 1 city D city D 4 8
# 2 city B city D 2 8
# 3 city A city A 12 9
For this use case, mapping should be faster (and scale better) than merging:
%%timeit
df1['Home'].map(df2.drop_duplicates(subset=['Home']).set_index('Home')['Home_ranking'])
df1['Away'].map(df2.drop_duplicates(subset=['Away']).set_index('Away')['Away_ranking'])
# 2.92 ms ± 72.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
%%timeit
(df1.drop(columns=['Home_ranking', 'Away_ranking'])
.merge(df2[['Home', 'Home_ranking']], how='left', on='Home')
.merge(df2[['Away', 'Away_ranking']], how='left', on='Away')
# 4.59 ms ± 30.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
Upvotes: 1
Reputation: 68146
You don't need loops. This is a couple of merge operations:
import pandas
df1 = pandas.DataFrame({
"Home": list("BDBA"),
"Away": list("CDDA"),
"Home_Ranking": [0] * 4,
"Away_Ranking": [0] * 4,
})
df2 = pandas.DataFrame({
"Home": list("ADFB"),
"Away": list("BCAD"),
"Home_Ranking": [12, 4, 1 ,2],
"Away_Ranking": [5, 7, 9, 8],
})
results = (
df1.drop(columns=["Home_Ranking", "Away_Ranking"])
.merge(df2[["Home", "Home_Ranking"]], how='left', on="Home")
.merge(df2[["Away", "Away_Ranking"]], how='left', on="Away")
)
And that gets you:
Home Away Home_Ranking Away_Ranking
B C 2 7
D D 4 8
B D 2 8
A A 12 9
Upvotes: 1