TaxiMike
TaxiMike

Reputation: 229

Pandas - Sum series of columns from 1-N

I have some spend columns from week 1 to 52 I am looking to sum the first 26 and the last 26 separately.

I have the following:

column_names = [x for x in df.columns.values.tolist() 
                if x.startswith("spend_")
               ]

This gives me all the columns i'm interested in.

['spend_1', 'spend_2', 'spend_3', 'spend_4', 'spend_5'...]

I can then sum them up as follows:

df['pre_spend'] = df[column_names].sum(axis=1)

This gives me all 52 weeks.

Is there an easy way to select 1_26 and 27_52 and sum separately?

In sas I would be doing this: pre_spend = sum(of spend_1-spend_26 );

Upvotes: 4

Views: 2381

Answers (2)

TaxiMike
TaxiMike

Reputation: 229

Thanks Jezrael works much better than where i'd got to:

column_names = [x for x in df.columns.values.tolist() 
                if x.startswith("spend_")
               ]

pre = df.loc[:,column_names[:26]]
pre = pre.sum(axis=1)
post = df.loc[:,column_names[26:]]
post = post.sum(axis=1)

Upvotes: 0

jezrael
jezrael

Reputation: 862591

I think you need DataFrame.loc for select columns by labels:

a = df.loc[:, 'spend_1':'spend_26'].sum(axis=1)

b = df.loc[:, 'spend_27':'spend_52'].sum(axis=1)

Sample:

np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,6))).add_prefix('spend_')
print (df)
   spend_0  spend_1  spend_2  spend_3  spend_4  spend_5
0        8        8        3        7        7        0
1        4        2        5        2        2        2
2        1        0        8        4        0        9
3        6        2        4        1        5        3
4        4        4        3        7        1        1

print (df.loc[:, 'spend_0':'spend_2'])
   spend_0  spend_1  spend_2
0        8        8        3
1        4        2        5
2        1        0        8
3        6        2        4
4        4        4        3

a = df.loc[:, 'spend_0':'spend_2'].sum(axis=1)
print (a)
0    19
1    11
2     9
3    12
4    11
dtype: int64

print (df.loc[:, 'spend_3':'spend_5'])
   spend_3  spend_4  spend_5
0        7        7        0
1        2        2        2
2        4        0        9
3        1        5        3
4        7        1        1

b = df.loc[:, 'spend_3':'spend_5'].sum(axis=1)
print (b)
0    14
1     6
2    13
3     9
4     9
dtype: int64

Upvotes: 3

Related Questions