Reputation: 142
I want to sum the numerical values in each row (Store A to Store D) for the month of June and place them in an appended column 'Sum'. But the results generate very huge sum values which are wrong. How to get correct sum?
This code was run using Python 3.6 :
import pandas as pd
import numpy as np
data = np.array([
['', 'week','storeA','storeB','storeC','storeD'],
[0,"2014-05-04",2643,8257,3893,6231],
[1,"2014-05-11",6444,5736,5634,7092],
[2,"2014-05-18",9646,2552,4253,5447],
[3,"2014-05-25",5960,10740,8264,6063],
[4,"2014-06-04",5960,10740,8264,6063],
[5,"2014-06-12",7412,7374,3208,3985]
])
df= pd.DataFrame(data=data[1:,1:],
index=data[1:,0],
columns=data[0,1:])
print(df)
# get rows of table which match Year,Month for last month
df2 = df[df['week'].str.contains("2014-06")].copy()
print(df2)
# generate col summing up each row
col_list = list(df2)
print(col_list)
col_list.remove('week')
print(col_list)
df2['Sum'] = df2[col_list].sum(axis=1)
print(df2)
Output of Sum column for rows 4 and 5:
Row4 - 5.960107e+16
Row5 - 7.412737e+15
Upvotes: 3
Views: 1112
Reputation: 403012
Because of the way your array is defined, with mixed strings and objects, everything is coerced to string. Take a look at this:
df.dtypes
week object
storeA object
storeB object
storeC object
storeD object
dtype: object
You have columns of strings, and sum
on string dataframes results in concatenation.
The solution is to convert these to integers first -
df2[col_list] = df2[col_list].astype(int)
Your code then works.
df2[col_list].sum(axis=1)
4 31027
5 21979
dtype: int64
Alternatively, declare data
as a object
array -
data = np.array([[...], [...], ...], dtype=object)
df = pd.DataFrame(data=data[1:,1:], index=data[1:,0], columns=data[0,1:])
Next, perform a soft conversion using infer_objects
(new in v0.22
):
df = df.infer_objects()
df.dtypes
week object
storeA int64
storeB int64
storeC int64
storeD int64
dtype: object
Works like a charm.
Upvotes: 1
Reputation: 153510
Use astype, to convert those strings to ints and sum works properly:
df2['Sum'] = df2[col_list].astype(int).sum(axis=1)
Output:
week storeA storeB storeC storeD Sum
4 2014-06-04 5960 10740 8264 6063 31027
5 2014-06-12 7412 7374 3208 3985 21979
What was happening,you were summing (concatenating) strings.
Upvotes: 3