Iterating through dataframe using regular expression python

Question

I a trying parse SI type patterns in another column in a DF or in a list I tried 2 things:

|    a             |
-------------------+
| Builder          |
| left             |
| SI_NAME lide_on  |
| SI_ID 456        |
| Scheduling Info  |

df['b']= df['a'].apply(lambda row: re.findall('\SI_\w+\s',row))

and

list_DF=[]
for index,row in df.iterrows():
    list_DF.append(re.findall('\SI_\w+\s',row[0]))

I am not able to get the result and the first one returned an empty list in the new column

Wiktor Stribiżew · Accepted Answer

You may use something like

df['b'] = df['a'].str.findall(r'^SI_\w+')

Using .str will force the contents to be parsed as string.

The ^SI_\w+ pattern matches SI_ and then 1+ word chars only at the beginning of the string (due to ^) - it looks like the entries you are after follow this pattern. You may add .apply(','.join) or something like that at the end to get string data in the resulting column.

Iterating through dataframe using regular expression python

Answers (1)

Related Questions