Shweta Kamble
Shweta Kamble

Reputation: 432

Iterating through dataframe using regular expression python

I a trying parse SI type patterns in another column in a DF or in a list I tried 2 things:

|    a             |
-------------------+
| Builder          |
| left             |
| SI_NAME lide_on  |
| SI_ID 456        |
| Scheduling Info  |

df['b']= df['a'].apply(lambda row: re.findall('\SI_\w+\s',row))  

and

list_DF=[]
for index,row in df.iterrows():
    list_DF.append(re.findall('\SI_\w+\s',row[0]))

I am not able to get the result and the first one returned an empty list in the new column

Upvotes: 2

Views: 3224

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may use something like

df['b'] = df['a'].str.findall(r'^SI_\w+')

Using .str will force the contents to be parsed as string.

The ^SI_\w+ pattern matches SI_ and then 1+ word chars only at the beginning of the string (due to ^) - it looks like the entries you are after follow this pattern. You may add .apply(','.join) or something like that at the end to get string data in the resulting column.

Upvotes: 4

Related Questions