Aparajita Dasgupta
Aparajita Dasgupta

Reputation: 21

Conditional summing of columns in pandas

I have the following database in Pandas:

Student-ID   Last-name   First-name  HW1  HW2  HW3  HW4  HW5  M1   M2  Final
59118211           Alf        Brian   96   90   88   93   96  78   60   59.0
59260567      Anderson         Jill   73   83   96   80   84  80   52   42.5
59402923     Archangel      Michael   99   80   60   94   98  41   56    0.0
59545279         Astor         John   93   88   97  100   55  53   53   88.9
59687635        Attach         Zach   69   75   61   65   91  90   63   69.0

I want to add only those columns which have "HW" in them. Any suggestions on how I can do that?

Note: The number of columns containing HW may differ. So I can't reference them directly.

Upvotes: 0

Views: 368

Answers (2)

Mark Graph
Mark Graph

Reputation: 5121

John's solution - using df.filter() - is more elegant, but you could also consider a list comprehension ...

df[[x for x in df.columns if 'HW' in x]].sum(axis=1)

Upvotes: 1

Zero
Zero

Reputation: 76917

You could all df.filter(regex='HW') to return column names like 'HW' and then apply sum row-wise via sum(axis-1)

In [23]: df
Out[23]:
   StudentID   Lastname Firstname  HW1  HW2  HW3  HW4  HW5  HW6  HW7   M1
0   59118211        Alf     Brian   96   90   88   93   96   97   88   10
1   59260567   Anderson      Jill   73   83   96   80   84   99   80  100
2   59402923  Archangel   Michael   99   80   60   94   98   73   97   50
3   59545279      Astor      John   93   88   97  100   55   96   86   60
4   59687635     Attach      Zach   69   75   61   65   91   89   82   55
5   59829991       Bake      Jake   56    0   77   78    0   79    0   10

In [24]: df.filter(regex='HW').sum(axis=1)
Out[24]:
0    648
1    595
2    601
3    615
4    532
5    290
dtype: int64

Upvotes: 2

Related Questions