Aemonar
Aemonar

Reputation: 63

GroupBy and then Mean/Median/Mode using str.contains Criteria

I have the following dataset example:

Name | Year | Score | 2nd Score | % of People | Country | Fruit | Export Countries | Language | Transit Duration | Quality | Taste | Freshness | Packaging
Andes, The | 2021 | 8 | 8.8 | 87% | The Netherlands | The Apple | United States,United Kingdom | English,Japanese,French | 148.0 | 1.0 | 0.0 | 0.0 | 0.0
Phil | 2021 | 8 | 8.4 | 87% | Spain | The Banana | United Kingdom, Germany | English,German,French,Italian | 165.0 | 1.0 | 0.0 | 0.0 | 0.0
Sarah | 2021 | 9 | 8.3 | 89% | Greece | The Plum | Germany,United States | English,German,French,Italian | 153.0 | 1.0 | 0.0 | 0.0 | 0.0

I want to be able to take the 'Language' column and for any entry that contains 'German', do a mean/median/mode on the Score column. The expected output for Mean would be something like:

English  8.33  
German  8.5
French  8.33
Italian  8.5
Japanese  8

I've got a syntax that will split them out into into their own columns and then I know I could do a logic on that way, however I would like to learn if the way that I want to analyse the dataset is even possible.

EDIT: Code as requested

[{'Name': 'Andes, The',
  'Year': 2021,
  'Score': '8',
  '2nd Score': 8.8,
  '% of People': '87%',
  'Country': 'The Netherlands',
  'Fruit': 'The Apple',
  'Export Countries': 'United States,United Kingdom',
  'Language': 'English,Japanese,French',
  'Transit Duration': 148.0,
  'Quality': 1.0,
  'Taste': 0.0,
  'Freshness': 0.0,
  'Packaging': 0.0},
 {'Name': 'Phil',
  'Year': 2021,
  'Score': '8',
  '2nd Score': 8.8,
  '% of People': '87%',
  'Country': 'Spain',
  'Fruit': 'The Banana',
  'Export Countries': 'United Kingdom, Germany',
  'Language': 'English,German,French,Italian',
  'Transit Duration': 118.0,
  'Quality': 1.0,
  'Taste': 0.0,
  'Freshness': 0.0,
  'Packaging': 0.0},
{'Name': 'Sarah',
  'Year': 2021,
  'Score': '9',
  '2nd Score': 8.8,
  '% of People': '89%',
  'Country': 'Greece',
  'Fruit': 'The Plum',
  'Export Countries': 'Germany,United States',
  'Language': 'English,German,French,Italian',
  'Transit Duration': 165.0,
  'Quality': 1.0,
  'Taste': 0.0,
  'Freshness': 0.0,
  'Packaging': 0.0},
{'Name': 'William',
  'Year': 2021,
  'Score': '6',
  '2nd Score': 8.8,
  '% of People': '65%',
  'Country': 'Brazil',
  'Fruit': 'Strawberries',
  'Export Countries': 'Spain,Greece',
  'Language': 'Spanish, French',
  'Transit Duration': 153.0,
  'Quality': 1.0,
  'Taste': 0.0,
  'Freshness': 0.0,
  'Packaging': 0.0},

Upvotes: 1

Views: 131

Answers (2)

mozway
mozway

Reputation: 260845

To get various aggregators on the Score column after filtering, you can use the following. NB. "mode" is not an aggregation function as there can be several modes, so I built one to only get the first mode.

def mode(s):
    return pd.Series.mode(s)[0]

(df[df['Language'].str.contains('German')]
  .assign(Language=lambda d: d['Language'].str.split(','))
  .explode('Language')
  ['Score'].agg(['mean',
                 'median',
                 mode,
                ])
)

Output:

mean      8.5
median    8.5
mode      8.0
Name: Score, dtype: float64

Per language:

def mode(s):
    return pd.Series.mode(s)[0]

(df[df['Language'].str.contains('German')]
  .assign(Language=lambda d: d['Language'].str.split(','))
  .explode('Language')
  .groupby('Language')
  ['Score'].agg(['mean',
                 'median',
                 mode,
                ])
)

Output:

mean    median  mode
Language            
English 8.5 8.5 8
French  8.5 8.5 8
German  8.5 8.5 8
Italian 8.5 8.5 8

Upvotes: 0

akuiper
akuiper

Reputation: 214977

You can split and explode Language column to make it flat and then aggregate by each individual language:

(df[['Score', 'Language']]
  .assign(Language=lambda x: x.Language.str.split(','))
  .explode('Language')
  .groupby('Language')
  .Score.mean()
  .reset_index())

   Language     Score
0   English  8.333333
1    French  8.333333
2    German  8.500000
3   Italian  8.500000
4  Japanese  8.000000

Upvotes: 1

Related Questions