ems
ems

Reputation: 990

Pandas sum of variable number of columns

I have a pandas dataframe like this -

Time    1 A    2 A     3 A     4 A     5 A    6 A    100 A
    5    10     4       6       6       4      6      4
    3    7      19      2       7       7      9      18
    6    3      6       3       3       8      10     56
    2    5      9       1       1       9      12     13

The Time columns gives me the number of A columns that I need to sum up.So that the output looks like this -

 Time   1 A    2 A     3 A     4 A     5 A    6 A    100 A    Total
    5    10     4       6       6       4      6      4         30
    3    7      19      2       7       7      9      18        28
    6    3      6       3       3       8      10     56        33
    2    5      9       1       1       9      12     13        14

In other words, when the value in Time column is 3, it should sum up 1A, 2A and 3A when the value in Time column is 5, it should sum up 1A, 2A, 3A, 4A and 5A

Note: There are other columns also in between the As. So I cant sum using simple indexing.

Highly appreciate any help in finding a solution.

Upvotes: 1

Views: 646

Answers (2)

jezrael
jezrael

Reputation: 862591

Use numpy - idea is compare array created by np.arange with length of columns with Time columns converted to index with broadcasting to 2d mask, get matched values by numpy.where and last sum:

df1 = df.set_index('Time')
m = np.arange(len(df1.columns)) < df1.index.values[:, None]
df['new'] = np.where(m, df1.values, 0).sum(axis=1)
print (df)
   Time  1 A  2 A  3 A  4 A  5 A  6 A  100 A  new
0     5   10    4    6    6    4    6      4   30
1     3    7   19    2    7    7    9     18   28
2     6    3    6    3    3    8   10     56   33
3     2    5    9    1    1    9   12     13   14

Details:

print (df1)
      1 A  2 A  3 A  4 A  5 A  6 A  100 A
Time                                     
5      10    4    6    6    4    6      4
3       7   19    2    7    7    9     18
6       3    6    3    3    8   10     56
2       5    9    1    1    9   12     13

print (m) 
[[ True  True  True  True  True False False]
 [ True  True  True False False False False]
 [ True  True  True  True  True  True False]
 [ True  True False False False False False]]

print (np.where(m, df1.values, 0))
[[10  4  6  6  4  0  0]
 [ 7 19  2  0  0  0  0]
 [ 3  6  3  3  8 10  0]
 [ 5  9  0  0  0  0  0]]

Upvotes: 3

Aryerez
Aryerez

Reputation: 3495

Try:

df['total'] = df.apply(lambda x: sum([x[i+1] for i in range(x['Time'])]), axis=1)

Upvotes: 1

Related Questions