Reputation: 1303
I have a pandas dataframe containing 2 columns. One containing regex pattern and the other having actual string. I want to filter out the rows where the pattern column and actual data comply with each other.
My data is in a csv file and it looks like below.
pattern,data
1234.*,abcd
567_.*,567_hello
I am expecting the output data frame to be as shown below.
pattern,data
567_.*,567_hello
I tried using lambda function on each row of DataFrame. But of no use.
df[df.apply(lambda row: re.compile(row[0]).match(row[1]))]
df[df.apply(lambda row: re.compile(row[0].str).match(row[1].str))]
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']))]
I could achieve this by constructing an all new DataFrame by iterating and filtering then. But it's not an efficient way to iterate dataframe. I am trying to work towards a more pythonic approach.
Upvotes: 2
Views: 239
Reputation: 94
After a bit of modification, here is the result:
df[df.apply(lambda row: re.compile(row['pattern']).match(row['data']) is not None, axis=1)]
Upvotes: 1