user86907
user86907

Reputation: 837

How to match a combination of two strings in python to find indices of rows?

I am trying to find the indices of rows that contain a combination of several rows in python. Please find a reproducible code below.

  df=[['Monday, 13 January 2020','',''],['Start',13588,'Jack'],['Starting','','Finish'],['','','City 1'],['Task',13589,'Finish'],['','','Address 2'],['','','City'],['Task 3',13589,'Amanda'],['','','Address 3'],['','','City 3'],['Tuesday, 14 January 2020','',''],['Task 4',13587,'Chelsea'],['','','Address 4'],['','','City 4'],['Task 5','13586','Ibrahim'],['','','Address 5'],['','','City 5'],['Task 6',13585,'Kate'],['','','Address 6'],['','','City 6']]
df=pd.DataFrame(df)
df.columns = ['Task','ID','Supervisor']

Here is my attempt. For example, I want to get the indices of rows that contain both start and finish or task and finish

Here is my attempt below:

    findrowindex = df[df.apply(lambda row: 
row.astype(str).str.contains(r'\b(?:start&finish|task&Finishing)\b').any(), axis=1)].index

Upvotes: 1

Views: 349

Answers (3)

Shubham Sharma
Shubham Sharma

Reputation: 71687

You can use:

m = (
    df[['Task', 'Supervisor']].agg(' '.join, axis=1)
    .str.contains(r'^(?:start.*finish|task.*finish)$', case=False)
)

Or you can use:

m = (
    df['Task'].str.contains(r'^(?:start|task)', case=False) 
    & df['Supervisor'].str.contains(r'finish$', case=False)
)

Result:

# print(df[m])

       Task     ID Supervisor
2  Starting            Finish
4    Task 2  13589     Finish

Upvotes: 2

sushanth
sushanth

Reputation: 8302

try this,

import re

# required because there are integer columns in the dataframe.
df = df.astype(str)

compile_ = re.compile("^Start.*Finish.*$|^Task.*Finish.*$", re.IGNORECASE)
mask = df.apply(lambda x : True if compile_.search("".join(x)) else False, axis=1)

print(df[mask])

output,

       Task     ID Supervisor
2  Starting            Finish
4    Task 2  13589     Finish

Upvotes: 0

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627100

You may use

>>> df[df.apply(lambda row: row.astype(str).str.contains("Start|Task", case=False, na=False).any() & row.astype(str).str.contains("Finish", case=False, na=False).any(), axis=1)]
       Task     ID Supervisor
2  Starting            Finish
4    Task 2  13589  Finishing

That is, use & bitwise AND operator with two calls to str.contains, one searching for Start or Task in a case insensitive way (due to case=False), and the other searching for Finish.

Upvotes: 1

Related Questions