Reputation: 739
I want to create a new data frame which has 2 columns, grouped by Striker_Id
and other column which has sum of 'Batsman_Scored' corresponding to the grouped 'Striker_Id'
Eg:
Striker_ID Batsman_Scored
1 0
2 8
...
I tried this ball.groupby(['Striker_Id'])['Batsman_Scored'].sum()
but this is what I get:
Striker_Id
1 0000040141000010111000001000020000004001010001...
2 0000000446404106064011111011100012106110621402...
3 0000121111114060001000101001011010010001041011...
4 0114110102100100011010000000006010011001111101...
5 0140016010010040000101111100101000111410011000...
6 1100100000104141011141001004001211200001110111...
It doesn't sum, only joins all the numbers. What's the alternative?
Upvotes: 1
Views: 2229
Reputation: 402483
For some reason, your columns were loaded as strings. While loading them from a CSV, try applying a converter -
df = pd.read_csv('file.csv', converters={'Batsman_Scored' : int})
Or,
df = pd.read_csv('file.csv', converters={'Batsman_Scored' : pd.to_numeric})
If that doesn't work, then convert to integer after loading -
df['Batsman_Scored'] = df['Batsman_Scored'].astype(int)
Or,
df['Batsman_Scored'] = pd.to_numeric(df['Batsman_Scored'], errors='coerce')
Now, performing the groupby should work -
r = df.groupby('Striker_Id')['Batsman_Scored'].sum()
Without access to your data, I can only speculate. But it seems like, at some point, your data contains non-numeric data that prevents pandas from being able to perform conversions, resulting in those columns being retained as strings. It's a little difficult to pinpoint this problematic data until you actually load it in and do something like
df.col.str.isdigit().any()
That'll tell you if there are any non-numeric items. Note that it only works for integers, float columns cannot be debugged like this.
Also, another way of seeing what columns have corrupt data would be to query dtypes
-
df.dtypes
Which will give you a listing of all columns and their datatypes. Use this to figure out what columns need parsing -
for c in df.columns[df.dtypes == object]:
print(c)
You can then apply the methods outlined above to fix them.
Upvotes: 1