Reputation: 123
I want to use np.cumsum()
on a csv file for a column of data based on 57 distinct id's represented by a separate column. My file looks like this:
station_id year Value
210018 1910 1
210018 1911 6
210018 1912 3
210019 1910 2
210019 1911 4
210019 1912 7
I want my output to look like this:
station_id year Value
210018 1910 1
210018 1911 7
210018 1912 10
210019 1910 2
210019 1911 6
210019 1912 13
I am currently using this code, with my initial file called df
:
df.groupby(['station_id']).apply(lambda x: np.cumsum(['Value']))
which returns:
TypeError: cannot perform accumulate with flexible type
Any help would be appreciated.
Upvotes: 0
Views: 336
Reputation: 880547
np.cumsum(['Value'])
, all by itself, raises
TypeError: cannot perform accumulate with flexible type
(np.cumsum
expects a numerical array as its first argument, not a list of strings.)
Instead use:
values = df.groupby(['station_id'])['Value'].cumsum()
or, you could modify df['Value']
directly:
In [75]: df['Value'] = df.groupby(['station_id'])['Value'].cumsum()
In [76]: df
Out[76]:
station_id year Value
0 210018 1910 1
1 210018 1911 7
2 210018 1912 10
3 210019 1910 2
4 210019 1911 6
5 210019 1912 13
Upvotes: 1