Pandas sum of variable number of columns

Question

I have a pandas dataframe like this -

Time    1 A    2 A     3 A     4 A     5 A    6 A    100 A
    5    10     4       6       6       4      6      4
    3    7      19      2       7       7      9      18
    6    3      6       3       3       8      10     56
    2    5      9       1       1       9      12     13

The Time columns gives me the number of A columns that I need to sum up.So that the output looks like this -

 Time   1 A    2 A     3 A     4 A     5 A    6 A    100 A    Total
    5    10     4       6       6       4      6      4         30
    3    7      19      2       7       7      9      18        28
    6    3      6       3       3       8      10     56        33
    2    5      9       1       1       9      12     13        14

In other words, when the value in Time column is 3, it should sum up 1A, 2A and 3A when the value in Time column is 5, it should sum up 1A, 2A, 3A, 4A and 5A

Note: There are other columns also in between the As. So I cant sum using simple indexing.

Highly appreciate any help in finding a solution.

jezrael · Accepted Answer

Use numpy - idea is compare array created by np.arange with length of columns with Time columns converted to index with broadcasting to 2d mask, get matched values by numpy.where and last sum:

df1 = df.set_index('Time')
m = np.arange(len(df1.columns)) < df1.index.values[:, None]
df['new'] = np.where(m, df1.values, 0).sum(axis=1)
print (df)
   Time  1 A  2 A  3 A  4 A  5 A  6 A  100 A  new
0     5   10    4    6    6    4    6      4   30
1     3    7   19    2    7    7    9     18   28
2     6    3    6    3    3    8   10     56   33
3     2    5    9    1    1    9   12     13   14

Details:

print (df1)
      1 A  2 A  3 A  4 A  5 A  6 A  100 A
Time                                     
5      10    4    6    6    4    6      4
3       7   19    2    7    7    9     18
6       3    6    3    3    8   10     56
2       5    9    1    1    9   12     13

print (m) 
[[ True  True  True  True  True False False]
 [ True  True  True False False False False]
 [ True  True  True  True  True  True False]
 [ True  True False False False False False]]

print (np.where(m, df1.values, 0))
[[10  4  6  6  4  0  0]
 [ 7 19  2  0  0  0  0]
 [ 3  6  3  3  8 10  0]
 [ 5  9  0  0  0  0  0]]

Pandas sum of variable number of columns

Answers (2)

Related Questions