Sara Daniel
Sara Daniel

Reputation: 165

Separate strings with regex and Pandas

I have below content and I need to separate third part as below with Pandas in Python:

My string:

FA0003 -BL- FA0005-BL
FA0004-BL-FA0008-BL

My Expected:

FA0005
FA0008

Imagine I have a string like this in a column named A, the regex of below string for retrieving FA0003 is as below, but I don't know how to retrieve FA0005?

FA0003 -BL- FA0005-BL
df[A].str.extract(r'(\w+\s*)', expand=False)
FA0003

Upvotes: 2

Views: 60

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626802

You can use

^(?:[^-]*-){2}\s*([^-]+)

See the regex demo

In Pandas, use it with your current code:

df[A].str.extract(r'^(?:[^-]*-){2}\s*([^-]+)', expand=False)

Details

  • ^ - start of string
  • (?:[^-]*-){2} - two occurrences of any chars other than - and then a -
  • \s* - zero or more whitespaces (this is used to trim the output)
  • ([^-]+) - Capturing group 1 (the return value): one or more chars other than -.

Upvotes: 3

Related Questions