Reputation: 837
I am trying to find the indices of rows that contain a combination of several rows in python. Please find a reproducible code below.
df=[['Monday, 13 January 2020','',''],['Start',13588,'Jack'],['Starting','','Finish'],['','','City 1'],['Task',13589,'Finish'],['','','Address 2'],['','','City'],['Task 3',13589,'Amanda'],['','','Address 3'],['','','City 3'],['Tuesday, 14 January 2020','',''],['Task 4',13587,'Chelsea'],['','','Address 4'],['','','City 4'],['Task 5','13586','Ibrahim'],['','','Address 5'],['','','City 5'],['Task 6',13585,'Kate'],['','','Address 6'],['','','City 6']]
df=pd.DataFrame(df)
df.columns = ['Task','ID','Supervisor']
Here is my attempt. For example, I want to get the indices of rows that contain both start
and finish
or task
and finish
Here is my attempt below:
findrowindex = df[df.apply(lambda row:
row.astype(str).str.contains(r'\b(?:start&finish|task&Finishing)\b').any(), axis=1)].index
Upvotes: 1
Views: 349
Reputation: 71687
You can use:
m = (
df[['Task', 'Supervisor']].agg(' '.join, axis=1)
.str.contains(r'^(?:start.*finish|task.*finish)$', case=False)
)
Or you can use:
m = (
df['Task'].str.contains(r'^(?:start|task)', case=False)
& df['Supervisor'].str.contains(r'finish$', case=False)
)
Result:
# print(df[m])
Task ID Supervisor
2 Starting Finish
4 Task 2 13589 Finish
Upvotes: 2
Reputation: 8302
try this,
import re
# required because there are integer columns in the dataframe.
df = df.astype(str)
compile_ = re.compile("^Start.*Finish.*$|^Task.*Finish.*$", re.IGNORECASE)
mask = df.apply(lambda x : True if compile_.search("".join(x)) else False, axis=1)
print(df[mask])
output,
Task ID Supervisor
2 Starting Finish
4 Task 2 13589 Finish
Upvotes: 0
Reputation: 627100
You may use
>>> df[df.apply(lambda row: row.astype(str).str.contains("Start|Task", case=False, na=False).any() & row.astype(str).str.contains("Finish", case=False, na=False).any(), axis=1)]
Task ID Supervisor
2 Starting Finish
4 Task 2 13589 Finishing
That is, use &
bitwise AND operator with two calls to str.contains
, one searching for Start
or Task
in a case insensitive way (due to case=False
), and the other searching for Finish
.
Upvotes: 1