Reputation: 25
I am new to Python. I want to calculate the sum, mean, median and standard deviation of each column but it returns a long string as the answer
df=pd.DataFrame({
'apple': {
0: '15.8',
1: '3562',
2: '51.36',
3: '179868',
4: '6.0',
5: ''
},
'banana': {
0: '27.84883300816733',
1: '44.64197389840307',
2: '',
3: '13.3',
4: '17.6',
5: '6.1'
},
'cheese': {
0: '27.68303400840678',
1: '39.93121897299962',
2: '',
3: '9.4',
4: '7.2',
5: '6.0'},
'egg': {0: '',
1: '7.2',
2: '66.0',
3: '23.77814972104277',
4: '23967',
5: ''}
}
)
For example, to calculate sum of apple column, I used
df['apple'].sum()
it gives me an output of 15.8356251.361798686.0
which is strange.
Kindly help.
Upvotes: 0
Views: 198
Reputation: 3290
This is what you want to do:
df = df.apply(pd.to_numeric, errors='coerce')
df.describe()
apple banana cheese egg
count 5.000000 5.000000 5.000000 4.000000
mean 36700.632000 21.898161 18.042851 6015.994537
std 80047.651817 14.955567 15.077552 11967.362577
min 6.000000 6.100000 6.000000 7.200000
25% 15.800000 13.300000 7.200000 19.633612
50% 51.360000 17.600000 9.400000 44.889075
75% 3562.000000 27.848833 27.683034 6041.250000
max 179868.000000 44.641974 39.931219 23967.000000
df.sum()
apple 183503.160000
banana 109.490807
cheese 90.214253
egg 24063.978150
dtype: float64
Upvotes: 1