Reputation: 117
I'm working with this pandas dataframe and would like to essentially set what is in the city
column of the match
dataframe to the ball_by_ball
dataframe if the IDs of the match in the match
dataframe is equal to the ID in the ball_by_ball
dataframe.
Ball by ball for all matches:
Data can be accessed https://www.kaggle.com/datasets/patrickb1912/ipl-complete-dataset-20082020
Some information on the dataframe: the match dataframe contains the final results of any particular match ID, while ball by ball contains the information on said match ID.
I tried the following which does "work":
First creating an empty column by
city_ball["city"]=None #city_ball is the same as ball_by_ball
Followed by:
for id in range(len(match["id"])):
for ID in range(len(city_ball["id"])):
if match["id"][id] == city_ball["id"][ID]:
city_ball["city"][ID] = match["city"][id]
But i understand this is a terrible way of implementing it, so I'm looking for suggestions to implement it efficiently.
Upvotes: 0
Views: 50
Reputation: 11105
This looks like a textbook use case for a join or merge, but the first thing I think of is to map each ID to its city using Series.map
:
# Build a Series which maps each id to a corresponding city.
# Drop duplicates to ensure each distinct id only appears once
id2city = match.drop_duplicates(subset='id').set_index('id')['city']
# Apply the mapping to each id in the target DataFrame
city_ball['city'] = city_ball['id'].map(id2city)
Upvotes: 2