Reputation: 229
I have some spend columns from week 1 to 52 I am looking to sum the first 26 and the last 26 separately.
I have the following:
column_names = [x for x in df.columns.values.tolist()
if x.startswith("spend_")
]
This gives me all the columns i'm interested in.
['spend_1', 'spend_2', 'spend_3', 'spend_4', 'spend_5'...]
I can then sum them up as follows:
df['pre_spend'] = df[column_names].sum(axis=1)
This gives me all 52 weeks.
Is there an easy way to select 1_26 and 27_52 and sum separately?
In sas I would be doing this: pre_spend = sum(of spend_1-spend_26 );
Upvotes: 4
Views: 2381
Reputation: 229
Thanks Jezrael works much better than where i'd got to:
column_names = [x for x in df.columns.values.tolist()
if x.startswith("spend_")
]
pre = df.loc[:,column_names[:26]]
pre = pre.sum(axis=1)
post = df.loc[:,column_names[26:]]
post = post.sum(axis=1)
Upvotes: 0
Reputation: 862591
I think you need DataFrame.loc
for select columns by labels:
a = df.loc[:, 'spend_1':'spend_26'].sum(axis=1)
b = df.loc[:, 'spend_27':'spend_52'].sum(axis=1)
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(5,6))).add_prefix('spend_')
print (df)
spend_0 spend_1 spend_2 spend_3 spend_4 spend_5
0 8 8 3 7 7 0
1 4 2 5 2 2 2
2 1 0 8 4 0 9
3 6 2 4 1 5 3
4 4 4 3 7 1 1
print (df.loc[:, 'spend_0':'spend_2'])
spend_0 spend_1 spend_2
0 8 8 3
1 4 2 5
2 1 0 8
3 6 2 4
4 4 4 3
a = df.loc[:, 'spend_0':'spend_2'].sum(axis=1)
print (a)
0 19
1 11
2 9
3 12
4 11
dtype: int64
print (df.loc[:, 'spend_3':'spend_5'])
spend_3 spend_4 spend_5
0 7 7 0
1 2 2 2
2 4 0 9
3 1 5 3
4 7 1 1
b = df.loc[:, 'spend_3':'spend_5'].sum(axis=1)
print (b)
0 14
1 6
2 13
3 9
4 9
dtype: int64
Upvotes: 3