Reputation: 348
I have a program that applies pd.groupby().agg('sum')
to a bunch of different pandas.DataFrame
objects. Those dataframes are all in the same format. The code works on all dataframes except for this dataframe (picture: df1) which produces funny result (picture: result1).
I tried:
df = df.groupby('Mapping')[list(df)].agg('sum')
This code works for df2
but not for df1
.
The code works fine for other dataframes (pictures: df2, result2)
Could somebody tell me why it turned out that way for df1?
Upvotes: 0
Views: 786
Reputation: 6343
It seems that in df1
, most of the numeric
columns are actually str
. You can tell by the commas (,
) that delimit thousands. Try:
df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: str(x).replace(",",""))
df.iloc[:,1:] = df.iloc[:,1:].apply(lambda x: pd.to_numeric(x))
The first line removes the commas from all the second, third, etc. columns. The second line turns them to numeric data types. This could actually be a one-liner, but I wrote it in two lines for readability's sake.
Once this is done, you can try your groupby
code.
It's good practice to check the data types of your columns as soon as you load them. You can do so with df1.dtypes
.
Upvotes: 0
Reputation: 11
The problem in the first dataframe is the commas in variables that should be numeric and i think that python is not recognizing the columns as numeric. Did you try to replace the commas?
Upvotes: 1