How to filter for columns where the first row (not header) starts with string

Question

I'm trying to filter a dataframe by the first row, but can't seem to figure out how to do it.

Here's a sample version of the data I'm working with:

In [11]: df = pd.DataFrame(
    ...:     [['Open-Ended Response', 'Open-Ended Response', 'Response', 'Response'], [1, 2, 3, 4]],
    ...:     columns=list('ABCD'),
    ...: )

In [12]: df
Out[12]:
                     A                    B         C         D
0  Open-Ended Response  Open-Ended Response  Response  Response
1                    1                    2         3         4

What I want to do is filter for all columns that start with "Response" in the first non-header row. So in this case, just have the last two columns in there own dataframe.

I can easily filter the header with something like this:

respo = [col for col in df if col.startswith('Response')]

But it doesn't seem to work if it's the 1t non-header row. Importantly, I need to keep the current header after I filter.

Thank you.

Michael Delgado · Accepted Answer

First step is to select the values of the first row:

df.iloc[0]  # selects the values in the first row

Then, use python's .str StringAccessor methods for working with data values rather than column names:

df.iloc[0].str.startswith('Response') # Test the result of the above line

This will give you a Series with True/False values indexed by column name. Finally, use this to select the columns from your dataframe based on the matched labels:

df.loc[:, df.iloc[0].str.startswith('Response')] # Select columns based on the test

This should do the trick!

See pandas's docs on Indexing and Selecting Data and the StringAccessor methods for more help.

How to filter for columns where the first row (not header) starts with string

Answers (1)

Related Questions