BarathVutukuri
BarathVutukuri

Reputation: 1303

Pandas filtering rows with regex pattern present in the row itself

I have a pandas dataframe containing 2 columns. One containing regex pattern and the other having actual string. I want to filter out the rows where the pattern column and actual data comply with each other.

My data is in a csv file and it looks like below.

pattern,data
1234.*,abcd
567_.*,567_hello

I am expecting the output data frame to be as shown below.

pattern,data
567_.*,567_hello

I tried using lambda function on each row of DataFrame. But of no use.

df[df.apply(lambda row: re.compile(row[0]).match(row[1]))]
df[df.apply(lambda row: re.compile(row[0].str).match(row[1].str))]
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']))]

I could achieve this by constructing an all new DataFrame by iterating and filtering then. But it's not an efficient way to iterate dataframe. I am trying to work towards a more pythonic approach.

Upvotes: 2

Views: 239

Answers (1)

Shohruh Abduakhatov
Shohruh Abduakhatov

Reputation: 94

After a bit of modification, here is the result:

df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']) is not None, axis=1)]

Upvotes: 1

Related Questions