Chris
Chris

Reputation: 1391

How to grab a string inside a pandas dataframe using a regex

I am trying to regex out a certain string inside my pandas df. Say I have a df like so:

         a                  b
0  foo foo AA123 bar        4
1  foo foo BB245 bar        5
2  foo CA234 bar bar        5

How would I get this df:

     a          b
0  AA123        4
1  BB245        5
2  CA234        5

One method I tried was df.replace({'(\w{3}\d{3})': ?}) but wasn't sure what to put for the second parameter.

Upvotes: 1

Views: 62

Answers (1)

sparc_spread
sparc_spread

Reputation: 10833

You could use the regex-based Series.str.extract function to keep just the matching group. You also need a fix to your regex - the cardinality for the \w elements should be 2. In the end the code would be:

df["a"] = df["a"].str.extract('(\w{2}\d{3})', expand=False)

The expand=False is to indicate you don't want str.extract to return a DataFrame, which it does by default in order to accommodate multiple regex groups (it returns one column per group). Since you already know there is just one regex group here, for convenience you specify expand=False to get back a Series you can immediately assign to df["a"]. If there were more than one regex group, the function would return a DataFrame no matter what you specified for expand, and you would index into it to get the column/group you wanted.

Upvotes: 3

Related Questions