Reputation: 95
Here is a schematic of the dataframe I'm working with (note, this is a representative example, and is not meant to demonstrate all possible entries in any column):
Name | Screen | Placeholder for other columns
Bill | GHRF (OOC) | text
Bob | GHRF (IC) | text
Sue | IRMS/CIR (OOC) | text
John | GHRF ISOFORMS IRMS CIR (OOC) | text
I am trying to select all the rows that have (OOC) in the Screen
column.
Normally, I would filter a dataframe with something like this dfnew = df[df['Column'] == 'Criteria']
, but that doesn't work with a regex.
I have also tried dfnew = df[df['Screen'].filter(regex = r'OOC', axis = 0)]
, which I thought would work, but didn't.
Could someone please explain to me how I can select rows based on a column entry using regex?
What I would like to wind up with, is something like this:
Name | Screen | Placeholder
Bill | GHRF (OOC) | text
SUE | IRMS/CIR (OOC) | text
John | GHRF ISOFORMS IRMS CIR (OOC) | text
Upvotes: 4
Views: 85
Reputation: 402323
DataFrame.filter
filters on the column names, not values. You're looking for str.contains
.
dfnew = df[df['Screen'].str.contains(r'\(OOC\)')]
Or, if you don't need regex, switch it off—
dfnew = df[df['Screen'].str.contains(r'(OOC)', regex=False)]
print(dfnew)
Name Screen
0 Bill GHRF (OOC)
2 Sue IRMS/CIR (OOC)
3 John GHRF ISOFORMS IRMS CIR (OOC)
If you're planning to do more indexing/assignment on dfnew
, I'd recommend instead creating it with
dfnew = df[df['Screen'].str.contains(r'\(OOC\)')].copy()
To avoid a SettingWithCopyWarning
later on.
Upvotes: 4
Reputation: 323226
We can try str.extract
df[df.Screen.str.extract('\((.*?)\)',expand=True)[0]=='OOC']
Upvotes: 2