ashleh
ashleh

Reputation: 225

Filter a pandas column using regex within the header

I'm reading in an Excel file to a Pandas data frame but one of the column headers has loads of comments in. It has a keyword 'Measure' amongst all this text which is specific to only this one header. Within the 'contains', how would I filter any header that simply has the keyword 'Measure' somewhere within the header?

The following code is filtering my data frame based 3 filters, but the third filter I just want it to identify the column itself that includes the text 'measure' opposed to having to write it as 'hereisallthe randomtextmeasure'

filtered = df[(df['Mode'].isin(mode_filter)) & (df['Level'].isin(level_filter)) & (df['hereisalltherandomtextmeasure'].isin(measure_filter))]

The reason I'm trying to do this is because I'm running the same code on multiple files but the 'measure' column changes for each file.

First file:

Mode | Level | hereisalltherandomtextmeasure

Second file:

Mode | Level | hereismorerandomtextmeasure

The only static thing about them is that they contain the word measure so ideally I'd like to identify the column that simply contains the word measure opposed to applying a full string.

Thanks.

Upvotes: 2

Views: 3080

Answers (1)

EdChum
EdChum

Reputation: 394129

IIUC then you can use str.contains to find if your matching string is contained anywhere in the columns:

In [7]:
df = pd.DataFrame(columns=['hereisall the random textMeasure', 'Measurement', 'asdasds'])
df.columns[df.columns.str.contains('Measure')]

Out[7]:
Index(['hereisall the random textMeasure', 'Measurement'], dtype='object')

Upvotes: 1

Related Questions