Alice_inwonderland
Alice_inwonderland

Reputation: 348

Pandas groupby does not return the expected output

I have a program that applies pd.groupby().agg('sum') to a bunch of different pandas.DataFrame objects. Those dataframes are all in the same format. The code works on all dataframes except for this dataframe (picture: df1) which produces funny result (picture: result1).

I tried:

df = df.groupby('Mapping')[list(df)].agg('sum')

This code works for df2 but not for df1.

df1

result1

The code works fine for other dataframes (pictures: df2, result2)

df2 result2

Could somebody tell me why it turned out that way for df1?

Upvotes: 0

Views: 786

Answers (2)

Arturo Sbr
Arturo Sbr

Reputation: 6343

It seems that in df1, most of the numeric columns are actually str. You can tell by the commas (,) that delimit thousands. Try:

df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: str(x).replace(",",""))
df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: pd.to_numeric(x))

The first line removes the commas from all the second, third, etc. columns. The second line turns them to numeric data types. This could actually be a one-liner, but I wrote it in two lines for readability's sake.

Once this is done, you can try your groupby code.

It's good practice to check the data types of your columns as soon as you load them. You can do so with df1.dtypes.

Upvotes: 0

cassioall
cassioall

Reputation: 11

The problem in the first dataframe is the commas in variables that should be numeric and i think that python is not recognizing the columns as numeric. Did you try to replace the commas?

Upvotes: 1

Related Questions