Reputation: 494
I have a pandas dataframe with several columns. for examlpe:
# name abbr country
0 454 Liverpool UCL England
1 454 Bayern Munich UCL Germany
2 223 Manchester United UEL England
3 454 Manchester City UCL England
and I run a function using .gropuby() - but then I want to add to each row of that group the value I calculated once.
The example code is here:
def test_func(abbreviation):
if abbreviation == 'UCL':
return 'UEFA Champions League'
elif abbreviation == 'UEL':
return 'UEFA Europe Leauge'
data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pd.DataFrame(data, columns=['#','name','abbr', 'country'])
competition_df = df.groupby('#').first()
competition_df['competition'] = competition_df.apply(lambda row: test_func(row["abbr"]), axis=1)
and now I would like to add the value of "competition" to all the cases based on group in the original dataframe (df).
Is there a good way (using 'native' pandas) to do it without iterations and lists etc.?
Edit 1:
The final output would then be the original dataframe (df) with the new column:
# name abbr country competition
0 454 Liverpool UCL England UEFA Champions League
1 454 Bayern Munich UCL Germany UEFA Champions League
2 223 Manchester United UEL England UEFA Europe Leauge
3 454 Manchester City UCL England UEFA Champions League
Edit 2:
I managed to get what I want by zipping, but its a very bad implementation and I am still wondering if I could do it better (and faster using some pandas functions) :
zipped = zip(competition_df.index, competition_df['competition'])
df['competition'] = np.nan
for num, comp in zipped:
df.loc[df['#']==num, 'competition'] = comp
Upvotes: 0
Views: 88
Reputation: 70
I think these might be helpful.
import pandas
data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pandas.DataFrame(data, columns=['#','name','abbr', 'country'])
# option 1
abbreviation_dict = {
'UCL': 'UEFA Champions League',
'UEL': 'UEFA Europe Leauge'
}
df['competition'] = df['abbr'].replace(abbreviation_dict)
# option 2 using a function
def get_dict_for_replace(unique_values):
some_dict = {}
for unique_value in unique_values:
if unique_value == 'UCL':
value_1 = 'UEFA Champions League' # or whatever is complicated
some_dict.update({'UCL': value_1})
elif unique_value == 'UEL':
value_2 = 'UEFA Europe Leauge' # or whatever is complicated
some_dict.update({'UEL': value_2})
return some_dict
# get your unique values,
unique_values = df['abbr'].unique()
# get your dictionary
abbreviation_dict = get_dict_for_replace(unique_values)
df['competition'] = df['abbr'].replace(abbreviation_dict)
Without knowing your exact problem then this is probably the most general if you want to use a function. Run each calculation once. Pass to the dataframe. You can probably pack your dictionary more efficiently based on your actual requirements.
aside: Using groupby on '#' instead of 'abbr' might have unwanted consequences unless the mapping is 1-to-1.
Upvotes: 1