Reputation: 1391
I am trying to regex out a certain string inside my pandas df. Say I have a df like so:
a b
0 foo foo AA123 bar 4
1 foo foo BB245 bar 5
2 foo CA234 bar bar 5
How would I get this df:
a b
0 AA123 4
1 BB245 5
2 CA234 5
One method I tried was df.replace({'(\w{3}\d{3})': ?})
but wasn't sure what to put for the second parameter.
Upvotes: 1
Views: 62
Reputation: 10833
You could use the regex-based Series.str.extract function to keep just the matching group. You also need a fix to your regex - the cardinality for the \w
elements should be 2. In the end the code would be:
df["a"] = df["a"].str.extract('(\w{2}\d{3})', expand=False)
The expand=False
is to indicate you don't want str.extract
to return a DataFrame
, which it does by default in order to accommodate multiple regex groups (it returns one column per group). Since you already know there is just one regex group here, for convenience you specify expand=False
to get back a Series
you can immediately assign to df["a"]
. If there were more than one regex group, the function would return a DataFrame
no matter what you specified for expand
, and you would index into it to get the column/group you wanted.
Upvotes: 3