John club
John club

Reputation: 19

is there a method to groupby in dataframe?

i have a dataframe like :

uci_class   doc_id  sentence_id     token                      
    1         1        1      Emmanuel Macron
    1         1        1          est              
    1         1        2      president             
    1         1        2          de                 
    1         1        1      Emmanuel Macron   
    1         1        2          aussi                 
    1         1        2          president        

i want to have in output:

uci_class   doc_id  sentence_id    count           
    1         1        1             2             
    1         1        2             2                           
    1         2        1             1  
    1         2        2             2                  

for example for the first row we have count=2 because if we do a group by (uci_class doc_id sentence_id) we will have two rows with (uci_class=1 , doc_id=1 and sentence_id=1)

that i want to do , i want to do a group by

Upvotes: 1

Views: 45

Answers (1)

ImTryingMyBest
ImTryingMyBest

Reputation: 372

sure, just use the .groupby method which is documented here.

import pandas as pd

df = pd.DataFrame({
        'uci_class': ['1','1','1','1','1','1','1'],
        'doc_id': ['1','1','1','1','1','1','1'],
        'sentence_id': ['1','1','2','2','1','2','2'],
        'token': ['Emmanuel Macron', 'est', 'president', 'de', 'Emmanuel Macron','aussi','president']
})

df_grouped = df.groupby(['uci_class','doc_id','sentence_id']).count().reset_index()
print(df_grouped)

As an aside, I see that you are working with natural language processing. I recommend using a library that handles "tokenization" or word-based analysis a bit more gracefully that pandas will for you. Check out nltk, if you haven't already. To spill the beans, my first-ever experience with python was teaching myself how to use nltk for a project I had in college. Good luck on your work!

Upvotes: 2

Related Questions