Reputation: 25
I have a dataframe of English words with French translations, here is a simplified version:
English | French |
---|---|
to provide | fournir |
to provide | fournir |
to provide | offrir, fournir |
to provide | offrir |
to provide | fournir |
to provide | offrir |
I would like to create a pivot table which counts the occurences of each translation, like so:
English | French translation | Number of times |
---|---|---|
to provide | fournir | 3 |
to provide | offrir, fournir | 1 |
to provide | offrir | 2 |
I have tried using the code below, but (1) it doesn't retain the French translations, and (2) there is only one row for each English word, whereas I would expect several rows, one for each given French translation.
df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index='english', values='french',aggfunc=pd.Series.nunique)
English | French |
---|---|
to provide | 712 |
to argue | 521 |
In this example from my real data, there is only ONE row for "to provide", whereas I want several, with the count for each different French translation. I have also tried the following code:
df_eng_word_trans = pd.pivot_table(df_user_vocabulary,index=['english','french'], values='french', aggfunc=pd.Series.nunique)
But I get the following error:
ValueError: Grouper for 'french' not 1-dimensional
As a bonus, I also need to expand the French translations where there are two (or more) words and add them to the count, like so, but I think this is stage two:
English | French translation | Number of times |
---|---|---|
to provide | fournir | 4 |
to provide | offrir | 3 |
Thank you for your help.
Upvotes: 0
Views: 646
Reputation: 120469
Use value_counts
:
>>> df.value_counts(['English', 'French']) \
.rename('Number of times').reset_index()
English French Number of times
0 to provide fournir 3
1 to provide offrir 2
2 to provide offrir, fournir 1
Upvotes: 1