RGETLB
RGETLB

Reputation: 65

Pandas GroupBy sum concatenates numbers instead of summing them

When I use the following code:

    print(self.df.groupby(by=[2])[3].agg(['sum']))

On the following Dataframe:

            0    1        2        3       4     5     6     7
0          15  LCU   Test 1   308.02  170703  ALCU  4868  MS10
1          16  LCU   Test 2   127.37  170703  ALCU  4868  MS10

The sum function is not completed correctly because the value column (col 3) returns a concatenated string of the values (308.02127.37) instead of maintaining the integrity of the individual values to allow operation.

Upvotes: 4

Views: 6968

Answers (1)

cs95
cs95

Reputation: 402353

It seems like your 3rd column is a string. Did you load in your dataframe using dtype=str?

Furthermore, try not to hardcode your columns. You can use .astype or pd.to_numeric to cast and then apply sum:

self.df.groupby(self.df.columns[2])[self.df.columns[3]].agg(
    lambda x: pd.to_numeric(x, errors='coerce').sum()
)

Or

self.df.groupby(self.df.columns[2])[self.df.columns[3]].agg(
    lambda x: x.astype(float).sum()
)

Upvotes: 11

Related Questions