Camue
Camue

Reputation: 481

delete a part of string before a specific pattern

I have a pandas dataframe with a column where I have to retrieve specific names. The only problem is, those names are not always at the same place and all the values of that columns do not have the same length, so I cannot use the split function . However, I have noticed that before those names, there is a always a combination of 4 to 7 digits. I believe it's the identifier for the name.
So how can I use regular expression to go through that column and retrieve the names I need. Here is a example from the jupyter notebook:

 df['info']
 csx_Gb009_broken screen_231400_Iphone 7
 000345_SamsungS8_tfes_Vodafone_is56t34_3G
 Ins45_56003_Huawei P8_

What I want is something like this:

 df['Phones']
 Iphone 7
 SamsungS8
 Huawei P8

I want to have something like the above knowing that those names come before a combination of 4 to 7 digits and end by an underscore.

Upvotes: 1

Views: 105

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626794

You may use

df['Phones'] = df['info'].str.extract(r'\d{4}_([^_]+)')

The pattern matches:

  • \d{4} - 4 digits
  • _ - an underscore
  • ([^_]+) - Capturing group 1 (this value will be returned by str.extract): one or more chars other than _.

See the regex demo.

Upvotes: 1

Related Questions