Using np.cumsum on a grouped csv file

Question

I want to use np.cumsum() on a csv file for a column of data based on 57 distinct id's represented by a separate column. My file looks like this:

station_id     year           Value
210018         1910            1
210018         1911            6
210018         1912            3
210019         1910            2
210019         1911            4
210019         1912            7

I want my output to look like this:

station_id     year           Value
210018         1910            1
210018         1911            7
210018         1912            10
210019         1910            2
210019         1911            6
210019         1912            13

I am currently using this code, with my initial file called df:

df.groupby(['station_id']).apply(lambda x: np.cumsum(['Value']))

which returns:

TypeError: cannot perform accumulate with flexible type

Any help would be appreciated.

unutbu · Accepted Answer

np.cumsum(['Value']), all by itself, raises

TypeError: cannot perform accumulate with flexible type

(np.cumsum expects a numerical array as its first argument, not a list of strings.) Instead use:

values = df.groupby(['station_id'])['Value'].cumsum()

or, you could modify df['Value'] directly:

In [75]: df['Value'] = df.groupby(['station_id'])['Value'].cumsum()

In [76]: df
Out[76]: 
   station_id  year  Value
0      210018  1910      1
1      210018  1911      7
2      210018  1912     10
3      210019  1910      2
4      210019  1911      6
5      210019  1912     13

Using np.cumsum on a grouped csv file

Answers (1)

Related Questions