Petr Petrov
Petr Petrov

Reputation: 4442

Pandas: sum of every N columns

I have dataframe

ID   2016-01   2016-02 ...  2017-01  2017-02 ... 2017-10  2017-11  2017-12
111    12        34           0        12          3        0        0
222    0         32           5         5          0        0        0

I need to count every 12 columns and get

ID   2016   2017
111   46     15
222   32     10

I try to use

(df.groupby((np.arange(len(df.columns)) // 31) + 1, axis=1).sum().add_prefix('s'))

But it returns to all columns But when I try to use

df.groupby['ID']((np.arange(len(df.columns)) // 31) + 1, axis=1).sum().add_prefix('s'))

It returns

TypeError: 'method' object is not subscriptable

How can I fix that?

Upvotes: 3

Views: 5645

Answers (3)

dsps
dsps

Reputation: 21

The above code has a slight syntax error and throws the following error:

ValueError: No axis named 1 for object type

Basically, the groupby condition needs to be wrapped by []. So I'm rewriting the code correctly for convenience:

new_df = df.groupby([[i//n for i in range(0,m)]], axis = 1).sum()

where n is the number of columns you want to group together and m is the total number of columns being grouped. You have to rename the columns after that.

Upvotes: 2

KKJ
KKJ

Reputation: 11

If you don't mind losing the labels, you can try this:

new_df = df.groupby([i//n for i in range(0,m)], axis = 1).sum()

where n is the number of columns you want to group together and m is the total number of columns being grouped. You have to rename the columns after that.

Upvotes: 1

jezrael
jezrael

Reputation: 862591

First set_index of all columns without dates:

df = df.set_index('ID')

1. groupby by splited columns and selected first:

df = df.groupby(df.columns.str.split('-').str[0], axis=1).sum()

2. lambda function for split:

df = df.groupby(lambda x: x.split('-')[0], axis=1).sum()

3. converted columns to datetimes and groupby years:

df.columns = pd.to_datetime(df.columns)
df = df.groupby(df.columns.year, axis=1).sum()

4. resample by years:

df.columns = pd.to_datetime(df.columns)
df = df.resample('A', axis=1).sum()
df.columns = df.columns.year

print (df)
     2016  2017
ID             
111    46    15
222    32    10

Upvotes: 6

Related Questions