BenTapscott
BenTapscott

Reputation: 25

Pivot table in pandas to count unique values

I have a dataframe of English words with French translations, here is a simplified version:

English French
to provide fournir
to provide fournir
to provide offrir, fournir
to provide offrir
to provide fournir
to provide offrir

I would like to create a pivot table which counts the occurences of each translation, like so:

English French translation Number of times
to provide fournir 3
to provide offrir, fournir 1
to provide offrir 2

I have tried using the code below, but (1) it doesn't retain the French translations, and (2) there is only one row for each English word, whereas I would expect several rows, one for each given French translation.

df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index='english', values='french',aggfunc=pd.Series.nunique)
English French
to provide 712
to argue 521

In this example from my real data, there is only ONE row for "to provide", whereas I want several, with the count for each different French translation. I have also tried the following code:

df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index=['english','french'], values='french', aggfunc=pd.Series.nunique)

But I get the following error:

ValueError: Grouper for 'french' not 1-dimensional

As a bonus, I also need to expand the French translations where there are two (or more) words and add them to the count, like so, but I think this is stage two:

English French translation Number of times
to provide fournir 4
to provide offrir 3

Thank you for your help.

Upvotes: 0

Views: 646

Answers (1)

Corralien
Corralien

Reputation: 120469

Use value_counts:

>>> df.value_counts(['English', 'French']) \
      .rename('Number of times').reset_index()

      English           French  Number of times
0  to provide          fournir                3
1  to provide           offrir                2
2  to provide  offrir, fournir                1

Upvotes: 1

Related Questions