manish
manish

Reputation: 55

Groupby all non-numeric columns and print the resulting dataframe aggregated by averages

I have to group it by all non-numeric columns(numeric columns would be float and int) and print the resulting dataframe aggregated by averages. The output should be first five rows of the resultant dataframe after the groupby operation.

input: csv file

output:

                                                                        Sentiment_Polarity  \
App                    Translated_Review                      Sentiment                       
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive                 1.00   
                      A big thanks ds I got bst gd health    Positive                 0.10   
                      Absolutely Fabulous Phenomenal         Positive                 0.45   
                      Amazing                                Positive                 0.60   
                      An excellent A useful                  Positive                 0.65   
                                                                     Sentiment_Subjectivity  
App                   Translated_Review                      Sentiment                          
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive                     0.65  
                     A big thanks ds I got bst gd health    Positive                     0.15  
                     Absolutely Fabulous Phenomenal         Positive                     0.75  
                     Amazing                                Positive                     0.90  
                     An excellent A useful                  Positive                     0.50  

Upvotes: 0

Views: 564

Answers (1)

pissall
pissall

Reputation: 7399

You can do that by using pandas.DataFrame.select_dtypes, exclude all the numeric columns, so you get the string or object type columns:

groupcols = df.select_dtypes(exclude="number").columns.tolist()
group_df = df.groupby(groupcols).mean() #.reset_index()

You can reset the index if you want to after these steps.

You can also use the following to get only categorial columns:

groupcols = df.select_dtypes(include="category").columns.tolist()

Please read the documentation on how to include/exclude the dtypes that you want.

EDIT:

If your original dataframe is a MultiIndex dataframe, you will need to do this as the first step:

# MultiIndex to columns
df = df.reset_index()

Upvotes: 1

Related Questions