user511792
user511792

Reputation: 519

Python pandas dataframe: filter columns using a list?

I have a dataframe that is large: 100000 rows * 10000 cols

Now I'm given a list of labels (call this list1) that do not match exactly with the labels of the columns in this dataframe, but match part of these labels. For example, a label in the dataframe might be "string1,D111" and the labels in list1 might look like "D111".

So now basically I want to find out all these corresponding columns using list1, and then sum all these columns, what is the most efficient way to do this?

Dataframe:
       string1,D111       string2,D222          string3,D333   ......    stringn,Dnnn
1         ..                   ..                     ..                     ..
2
3
4
5
6
...


My list1:  D111, D333,...Dxxx

Upvotes: 1

Views: 1864

Answers (1)

Jeff
Jeff

Reputation: 128948

In [28]: df = DataFrame(randn(10,10),columns=[ 'c_%s' % i for i in range(3)] + ['d_%s' % i for i in range(3) ] + ['e_%s' % i for i in range(4)])

In [3]: df.filter(regex='d_|e_')
Out[3]: 
        d_0       d_1       d_2       e_0       e_1       e_2       e_3
0 -0.022661 -0.504317  0.279227  0.286951 -0.126999 -1.658422  1.577863
1  0.501654  0.145550 -0.864171 -0.374261 -0.399360  1.217679  1.357648
2 -0.608580  1.138143  1.228663  0.427360  0.256808  0.105568 -0.037422
3 -0.993896 -0.581638 -0.937488  0.038593 -2.012554 -0.182407  0.689899
4  0.424005 -0.913518  0.405155 -1.111424 -0.180506  1.211730  0.118168
5  0.701127  0.644692 -0.188302 -0.561400  0.748692 -0.585822  1.578240
6  0.475958 -0.901369 -0.734969  1.090093  1.297208  1.140128  0.173941
7 -0.679514 -0.790529 -2.057733  0.420175  1.766671 -0.797129 -0.825583
8 -0.918645  0.916237  0.992001 -0.440573 -1.875960 -1.223502  0.084821
9  1.096687 -1.414057 -0.268211  0.253461 -0.175931  1.481261 -0.200600

Upvotes: 8

Related Questions