lukehawk
lukehawk

Reputation: 1493

How do I get which columns of a row are within some values in Pandas?

I know this has to be out there in the ether, but I cannot find it. I am fluent in R, trying to figure out Pandas, and it is making me want to throw this PC out of a window. It has been a long day.

I want to be able to extract the column names of a dataframe, based on the values in the columns of some row:

foo = pd.DataFrame(
[[-1,-5,3,0,-5,8,1,2]],
columns = ('a','b','c','d','e','f','g','h')
)

foo
Out[25]: 
   a  b  c  d  e  f  g  h
0 -1 -5  3  0 -5  8  1  2

I would like to get a vector I can subset some other dataframe by:

foo >= 0

Gives me another dataframe, which I cannot use to subset a vector (series? whatever you people refer to it as??)

I want to do something like this:

otherDF[ foo >= 0 ]

Thoughts???

Upvotes: 1

Views: 43

Answers (2)

dataflow
dataflow

Reputation: 481

You just need to use loc (e.g. df.loc[:,columns])

import pandas as pd
import numpy as np

cols = ('a','b','c','d','e','f','g','h')
foo = pd.DataFrame(
[[-1,-5,3,0,-5,8,1,2]],
columns = cols)

bar = pd.DataFrame(np.random.randint(0, 10, (3, len(cols))), columns=cols)

print foo

   a  b  c  d  e  f  g  h
0 -1 -5  3  0 -5  8  1  2

print bar

   a  b  c  d  e  f  g  h
0  7  9  2  9  5  3  2  9
1  5  7  4  1  5  1  4  0
2  4  9  1  3  3  7  0  2


columns_boolean = foo.iloc[0] >= 0
columns_to_keep = foo.columns[columns_boolean]

print bar.loc[:, columns_to_keep] 


   c  d  f  g  h
0  2  9  3  2  9
1  4  1  1  4  0
2  1  3  7  0  2

Alternatively, if your other dataframe doesn't have the same column names but has the same number of columns, you can still use "loc" but just pass in the boolean array of which columns to keep:

bar.loc[:, columns_boolean.values]



  c  d  f  g  h
0  7  2  6  3  9
1  4  3  8  0  3
2  5  7  1  3  0

Upvotes: 1

EdChum
EdChum

Reputation: 394179

IIUC you're after the column mask:

In [25]:
foo[foo >= 0].dropna(axis=1).columns

Out[25]:
Index(['c', 'd', 'f', 'g', 'h'], dtype='object')

if you use the condition to mask the df:

In [26]:
foo[foo >= 0]

Out[26]:
    a   b  c  d   e  f  g  h
0 NaN NaN  3  0 NaN  8  1  2

If we then drop the columns with NaN, this leaves just the columns of interest:

In [27]:
foo[foo >= 0].dropna(axis=1)

Out[27]:
   c  d  f  g  h
0  3  0  8  1  2

You can then get just the columns using the .columns attribute

Upvotes: 1

Related Questions