MTG
MTG

Reputation: 301

Selecting columns from a pandas dataframe based on columns conditions

I want to select to new dataframe, columns that have 'C' in value

protein 1   2   3   4   5
prot1   C   M   D   F   A
prot2   C   D   A   M   A 
prot3   C   C   D   F   A
prot4   S   D   F   C   L
prot5   S   D   A   I   L

So i want to have this:

protein 1   2   4   
prot1   C   M   F   
prot2   C   D   M    
prot3   C   C   F   
prot4   S   D   C   
prot5   S   D   I   

Number of colums can be n, i found examples only which i must specify column name... i cant do this here. The script should check column by colummn.

Upvotes: 0

Views: 451

Answers (2)

jezrael
jezrael

Reputation: 862641

Use:

np.random.seed(123)
n = np.random.choice(['C','M','D', '-'], size=(3,10))
n[:,0] = ['a','b','w']
foo = pd.DataFrame(n) 
print (foo)
   0  1  2  3  4  5  6  7  8  9
0  a  M  D  D  C  D  D  M  -  D
1  b  M  D  M  C  M  D  -  M  C
2  w  C  -  M  -  D  M  C  C  C

mask = foo.eq('C').any()
#set columns which need in output
mask.loc[0] = True

#filter
print (foo.loc[:,mask])
   0  1  4  7  8  9
0  a  M  C  M  -  D
1  b  M  C  -  M  C
2  w  C  -  C  C  C

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

In [22]: df[['protein']].join(df[df.columns[df.eq('C').any()]])
Out[22]:
  protein  1  2  4
0   prot1  C  M  F
1   prot2  C  D  M
2   prot3  C  C  F
3   prot4  S  D  C
4   prot5  S  D  I

Upvotes: 2

Related Questions