ArieAI
ArieAI

Reputation: 494

pandas add a value to new column to each row in a group

I have a pandas dataframe with several columns. for examlpe:

     #  name        abbr    country
0   454 Liverpool   UCL England
1   454 Bayern Munich   UCL Germany
2   223 Manchester United   UEL England
3   454 Manchester City UCL England

and I run a function using .gropuby() - but then I want to add to each row of that group the value I calculated once.

The example code is here:

def test_func(abbreviation):
  if abbreviation == 'UCL':
    return 'UEFA Champions League'
  elif abbreviation == 'UEL':
    return 'UEFA Europe Leauge'

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pd.DataFrame(data, columns=['#','name','abbr', 'country'])
competition_df = df.groupby('#').first()
competition_df['competition'] = competition_df.apply(lambda row: test_func(row["abbr"]), axis=1)

and now I would like to add the value of "competition" to all the cases based on group in the original dataframe (df).

Is there a good way (using 'native' pandas) to do it without iterations and lists etc.?


Edit 1:

The final output would then be the original dataframe (df) with the new column:

    #   name    abbr    country competition
0   454 Liverpool   UCL England UEFA Champions League
1   454 Bayern Munich   UCL Germany UEFA Champions League
2   223 Manchester United   UEL England UEFA Europe Leauge
3   454 Manchester City UCL England UEFA Champions League

Edit 2:

I managed to get what I want by zipping, but its a very bad implementation and I am still wondering if I could do it better (and faster using some pandas functions) :

zipped = zip(competition_df.index, competition_df['competition'])
df['competition'] = np.nan
for num, comp in zipped:
  df.loc[df['#']==num, 'competition'] = comp

Upvotes: 0

Views: 88

Answers (1)

bob marley
bob marley

Reputation: 70

I think these might be helpful.

import pandas

data = [[454, 'Liverpool', 'UCL', 'England'], [454, 'Bayern Munich', 'UCL', 'Germany'], [223, 'Manchester United', 'UEL', 'England'], [454, 'Manchester City', 'UCL', 'England']]
df = pandas.DataFrame(data, columns=['#','name','abbr', 'country'])

# option 1
abbreviation_dict = {
    'UCL': 'UEFA Champions League',
    'UEL': 'UEFA Europe Leauge'
}

df['competition'] = df['abbr'].replace(abbreviation_dict)
# option 2 using a function
def get_dict_for_replace(unique_values):
    some_dict = {}
    for unique_value in unique_values:
        if unique_value == 'UCL':
            value_1 = 'UEFA Champions League'  # or whatever is complicated
            some_dict.update({'UCL': value_1})

        elif unique_value  == 'UEL':
            value_2 = 'UEFA Europe Leauge'  # or whatever is complicated
            some_dict.update({'UEL': value_2})

    return some_dict

# get your unique values,
unique_values = df['abbr'].unique() 
# get your dictionary
abbreviation_dict = get_dict_for_replace(unique_values)

df['competition'] = df['abbr'].replace(abbreviation_dict)

Without knowing your exact problem then this is probably the most general if you want to use a function. Run each calculation once. Pass to the dataframe. You can probably pack your dictionary more efficiently based on your actual requirements.

aside: Using groupby on '#' instead of 'abbr' might have unwanted consequences unless the mapping is 1-to-1.

Upvotes: 1

Related Questions