Reputation: 17
I want to add the league the club plays in, and the country that league is based in, as new columns, for every row.
I initially tried this using dictionaries with the clubs/countries, and returning the key for a value:
club_country_dict = {'La Liga':['Real Madrid','FC Barcelona'],'France Ligue 1':['Paris Saint-Germain']}
key_list=list(club_country_dict.keys())
val_list=list(club_country_dict.values())
But this ran into issues since each of my keys values is actually a list, rather than a single value.
I then tried some IF THEN logic, by just having standalone variables for each league, and checking if the club value was in each variable:
la_Liga = ['Real Madrid','FC Barcelona']
for row in data:
if data['Club'] in la_Liga:
data['League'] = 'La Liga'
Apologies for the messy question. Basically I'm looking to add a two new columns to my dataset, 'League' and 'Country', based on the 'Club' column value. I'm not sure what's the easiest way to do this but I've hit walls trying to different ways. Thanks in advance.
Upvotes: 0
Views: 2489
Reputation: 31011
To cope with the "list problem" in club_country_dict, convert it to the following Series:
league_club = pd.Series(club_country_dict.values(), index=club_country_dict.keys(),
name='Club').explode()
The result is:
La Liga Real Madrid
La Liga FC Barcelona
France Ligue 1 Paris Saint-Germain
Name: Club, dtype: object
You should also have a "connection" between the league name and its country (another Series):
league_country = pd.Series({'La Liga': 'Spain', 'France Ligue 1': 'France'}, name='Country')
Of course, add here other leagues of interest with their countries.
The next step is to join them into club_details DataFrame, with Club as the index:
club_details = league_club.to_frame().join(league_country).reset_index()\
.rename(columns={'index':'League'}).set_index('Club')
The result is:
League Country
Club
Paris Saint-Germain France Ligue 1 France
Real Madrid La Liga Spain
FC Barcelona La Liga Spain
Then, assuming that your first DataFrame is named player, generate the final result:
result = player.join(club_details, on='Club')
The result is:
Name Club League Country
0 Cristiano Ronaldo Real Madrid La Liga Spain
1 L. Messi FC Barcelona La Liga Spain
2 Neymar Paris Saint-Germain France Ligue 1 France
Upvotes: 1
Reputation: 1496
Try pandas replace
feature for Series.
df = pd.DataFrame({"Name" : ['Cristiano', 'L. Messi', "Neymar"], 'Club' : ["Real Madrid", "FC Barcelona", "Paris Saint-Germain"]})
df:
Name Club
0 Cristiano Real Madrid
1 L. Messi FC Barcelona
2 Neymar Paris Saint-Germain
Now add new column:
club_country_dict = {'Real Madrid': 'La Liga',
'FC Barcelona' : "La Liga",
'Paris Saint-Germain': 'France Ligue 1'}
df['League'] = df.Club.replace(club_country_dict)
df:
Name Club League
0 Cristiano Real Madrid La Liga
1 L. Messi FC Barcelona La Liga
2 Neymar Paris Saint-Germain France Ligue 1
Upvotes: 1
Reputation: 407
Here is one of the simple way to solve it. Use Pandas apply function on rows https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
import pandas as pd
df = pd.DataFrame({"name": ["CR", "Messi", "neymar"], "club": ["RM", "BR", "PSG"]})
country = {"BR": "Spain", "PSG": "France", "RM": "Spain"}
df["country"] = df.apply(lambda row: country[row.club], axis=1)
print(df)
Output:
name club country
0 CR RM Spain
1 Messi BR Spain
2 neymar PSG France
Upvotes: 1
Reputation: 2332
You could convert the dictionary to a data frame and then merge:
df = pd.DataFrame({"Name": ["CR", "Messi", "neymar"], "Club": ["Real Madrid", "FC Barcelona", "Paris Saint-Germain"]})
df.merge(pd.DataFrame(club_country_dict.items(), columns=['League', 'Club']).explode('Club'),
on = 'Club', how='left')
Upvotes: 3