Reputation: 163
I'm trying to filter a dataframe by the first row, but can't seem to figure out how to do it.
Here's a sample version of the data I'm working with:
In [11]: df = pd.DataFrame(
...: [['Open-Ended Response', 'Open-Ended Response', 'Response', 'Response'], [1, 2, 3, 4]],
...: columns=list('ABCD'),
...: )
In [12]: df
Out[12]:
A B C D
0 Open-Ended Response Open-Ended Response Response Response
1 1 2 3 4
What I want to do is filter for all columns that start with "Response" in the first non-header row. So in this case, just have the last two columns in there own dataframe.
I can easily filter the header with something like this:
respo = [col for col in df if col.startswith('Response')]
But it doesn't seem to work if it's the 1t non-header row. Importantly, I need to keep the current header after I filter.
Thank you.
Upvotes: 0
Views: 1205
Reputation: 15452
First step is to select the values of the first row:
df.iloc[0] # selects the values in the first row
Then, use python's .str
StringAccessor methods for working with data values rather than column names:
df.iloc[0].str.startswith('Response') # Test the result of the above line
This will give you a Series with True/False values indexed by column name. Finally, use this to select the columns from your dataframe based on the matched labels:
df.loc[:, df.iloc[0].str.startswith('Response')] # Select columns based on the test
This should do the trick!
See pandas's docs on Indexing and Selecting Data and the StringAccessor methods for more help.
Upvotes: 1