Reputation: 55
I have to group it by all non-numeric columns(numeric columns would be float and int) and print the resulting dataframe aggregated by averages. The output should be first five rows of the resultant dataframe after the groupby operation.
input: csv file
output:
Sentiment_Polarity \
App Translated_Review Sentiment
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive 1.00
A big thanks ds I got bst gd health Positive 0.10
Absolutely Fabulous Phenomenal Positive 0.45
Amazing Positive 0.60
An excellent A useful Positive 0.65
Sentiment_Subjectivity
App Translated_Review Sentiment
10 Best Foods for You 10 best foods 4u Excellent chose foods Positive 0.65
A big thanks ds I got bst gd health Positive 0.15
Absolutely Fabulous Phenomenal Positive 0.75
Amazing Positive 0.90
An excellent A useful Positive 0.50
Upvotes: 0
Views: 564
Reputation: 7399
You can do that by using pandas.DataFrame.select_dtypes
, exclude all the numeric columns, so you get the string
or object
type columns:
groupcols = df.select_dtypes(exclude="number").columns.tolist()
group_df = df.groupby(groupcols).mean() #.reset_index()
You can reset the index if you want to after these steps.
You can also use the following to get only categorial columns:
groupcols = df.select_dtypes(include="category").columns.tolist()
Please read the documentation on how to include/exclude the dtypes
that you want.
If your original dataframe is a MultiIndex
dataframe, you will need to do this as the first step:
# MultiIndex to columns
df = df.reset_index()
Upvotes: 1