T. Kelly
T. Kelly

Reputation: 95

How to select rows based off a column entry using regex to filter?

Here is a schematic of the dataframe I'm working with (note, this is a representative example, and is not meant to demonstrate all possible entries in any column):

Name | Screen | Placeholder for other columns

Bill | GHRF (OOC) | text

Bob | GHRF (IC) | text

Sue | IRMS/CIR (OOC) | text

John | GHRF ISOFORMS IRMS CIR (OOC) | text

I am trying to select all the rows that have (OOC) in the Screen column.

Normally, I would filter a dataframe with something like this dfnew = df[df['Column'] == 'Criteria'], but that doesn't work with a regex.

I have also tried dfnew = df[df['Screen'].filter(regex = r'OOC', axis = 0)], which I thought would work, but didn't.

Could someone please explain to me how I can select rows based on a column entry using regex?

What I would like to wind up with, is something like this:

Name | Screen | Placeholder

Bill | GHRF (OOC) | text

SUE | IRMS/CIR (OOC) | text

John | GHRF ISOFORMS IRMS CIR (OOC) | text

Upvotes: 4

Views: 85

Answers (2)

cs95
cs95

Reputation: 402323

DataFrame.filter filters on the column names, not values. You're looking for str.contains.

dfnew = df[df['Screen'].str.contains(r'\(OOC\)')]

Or, if you don't need regex, switch it off—

dfnew = df[df['Screen'].str.contains(r'(OOC)', regex=False)]

print(dfnew)
   Name                        Screen
0  Bill                    GHRF (OOC)
2   Sue                IRMS/CIR (OOC)
3  John  GHRF ISOFORMS IRMS CIR (OOC)

If you're planning to do more indexing/assignment on dfnew, I'd recommend instead creating it with

dfnew = df[df['Screen'].str.contains(r'\(OOC\)')].copy()

To avoid a SettingWithCopyWarning later on.

Upvotes: 4

BENY
BENY

Reputation: 323226

We can try str.extract

df[df.Screen.str.extract('\((.*?)\)',expand=True)[0]=='OOC']

Upvotes: 2

Related Questions