Reputation: 1493
I know this has to be out there in the ether, but I cannot find it. I am fluent in R, trying to figure out Pandas, and it is making me want to throw this PC out of a window. It has been a long day.
I want to be able to extract the column names of a dataframe, based on the values in the columns of some row:
foo = pd.DataFrame(
[[-1,-5,3,0,-5,8,1,2]],
columns = ('a','b','c','d','e','f','g','h')
)
foo
Out[25]:
a b c d e f g h
0 -1 -5 3 0 -5 8 1 2
I would like to get a vector I can subset some other dataframe by:
foo >= 0
Gives me another dataframe, which I cannot use to subset a vector (series? whatever you people refer to it as??)
I want to do something like this:
otherDF[ foo >= 0 ]
Thoughts???
Upvotes: 1
Views: 43
Reputation: 481
You just need to use loc (e.g. df.loc[:,columns])
import pandas as pd
import numpy as np
cols = ('a','b','c','d','e','f','g','h')
foo = pd.DataFrame(
[[-1,-5,3,0,-5,8,1,2]],
columns = cols)
bar = pd.DataFrame(np.random.randint(0, 10, (3, len(cols))), columns=cols)
print foo
a b c d e f g h
0 -1 -5 3 0 -5 8 1 2
print bar
a b c d e f g h
0 7 9 2 9 5 3 2 9
1 5 7 4 1 5 1 4 0
2 4 9 1 3 3 7 0 2
columns_boolean = foo.iloc[0] >= 0
columns_to_keep = foo.columns[columns_boolean]
print bar.loc[:, columns_to_keep]
c d f g h
0 2 9 3 2 9
1 4 1 1 4 0
2 1 3 7 0 2
Alternatively, if your other dataframe doesn't have the same column names but has the same number of columns, you can still use "loc" but just pass in the boolean array of which columns to keep:
bar.loc[:, columns_boolean.values]
c d f g h
0 7 2 6 3 9
1 4 3 8 0 3
2 5 7 1 3 0
Upvotes: 1
Reputation: 394179
IIUC you're after the column mask:
In [25]:
foo[foo >= 0].dropna(axis=1).columns
Out[25]:
Index(['c', 'd', 'f', 'g', 'h'], dtype='object')
if you use the condition to mask the df:
In [26]:
foo[foo >= 0]
Out[26]:
a b c d e f g h
0 NaN NaN 3 0 NaN 8 1 2
If we then drop the columns with NaN
, this leaves just the columns of interest:
In [27]:
foo[foo >= 0].dropna(axis=1)
Out[27]:
c d f g h
0 3 0 8 1 2
You can then get just the columns using the .columns
attribute
Upvotes: 1