mdicrist
mdicrist

Reputation: 175

Filtering Dataframe Using Headers From Other Dataframes in Python

I am trying to filter a dataframe based on the columns I have previously obtained from filtering the dataframe below.

AA BB CC DD EE FF GG 
0  1  1  0  1  0  0

The dataframe is coming from a file where the data in each row is either a 0 or a 1 and will change based on the file that is loaded in. I have used the following code to filter this dataframe so that my output consists of only columns with a value of 1 in them.

with open('Factors.txt') as b:
    IncludedFactors = pd.read_table(b, sep=',' )
print IncludedFactors

InterestingFactors = IncludedFactors.drop(IncludedFactors.columns[~IncludedFactors.iloc[0].astype(bool)],axis=1)
print InterestingFactors

output:
BB CC EE
1  1  1

I then need to filter out a bigger dataframe that has many headers, however I only need the ID, Xposition, Yposition and the headers of the InterestingFactors dataframe.

Below is the code I have tried, however the output still only consists of 3 headers instead of the 6 I need.

headers = InterestingFactors.columns.values
print headers
PivotTable = InfoTable.filter(items=['ID', 'Postion_X','Position_Y','headers'])
print PivotTable

Any help on how to do this correctly is greatly appreciated!

Upvotes: 0

Views: 95

Answers (1)

Andrew L
Andrew L

Reputation: 7038

Here's one way you can do this:

headers = InterestingFactors.columns.append(pd.Index(['ID','Postion_X','Position_Y']))
PivotTable = InfoTable.loc[:, headers]

This combines the columns you're looking for from InterestingFactors with the 3 columns you mention above. This Index is passed to .loc[].

This also works:

headers = InterestingFactors.columns
PivotTable = InfoTable.loc[:, (pd.Index(['ID','Postion_X','Position_Y']) | headers)]

The datatypes for comparison (I believe) must be the same. Converting your list of 3 standard columns to pd.Index will allow you to use | within .loc[].

Upvotes: 1

Related Questions