user2900014
user2900014

Reputation: 23

String extraction in Python / Pandas with repeated delimiter

I have a data frame with a column that includes any combination of one or many variables, separated by a '/' delimiter, e.g.:

Rd/MLERS
Rd
Rd          
Rd/DLEPC/DLERS
SLERS
MLERS

Etc., etc. I want to extract the primary classifier, i.e.: the only or the first variable immediately preceding the first '/' character. I don't have a lot of experience with str.extract and my effort -

df["primaryEjecta1"] = df["MORPHOLOGY_EJECTA_1"].str.extract('(.*)/', expand=True)

does not work as anticipated -

Rd
NaN
NaN
Rd/DLEPC
NaN
NaN

Specifically -

Sure this simple to fix if you know how - but most of the examples and tutorials that I have been able to find on-line assume nice, neat delimiters that are not repeated - so appreciate any help that you guys can offer.

Upvotes: 2

Views: 1048

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210912

you can use powerful extract() method:

In [31]: df
Out[31]:
              txt
0        Rd/MLERS
1              Rd
2              Rd
3  Rd/DLEPC/DLERS
4           SLERS
5           MLERS

In [32]: df['clsfr'] = df['txt'].str.extract(r'([^\/]+)', expand=True)

In [33]: df
Out[33]:
              txt  clsfr
0        Rd/MLERS     Rd
1              Rd     Rd
2              Rd     Rd
3  Rd/DLEPC/DLERS     Rd
4           SLERS  SLERS
5           MLERS  MLERS

Explanation:

RegEx ([^\/]+) - means take anything except / (and until the first occurrence of /) into the first group

Upvotes: 1

EdChum
EdChum

Reputation: 394189

use str.split and str[0] to access the first split, this will still return the initial string even without the separator:

In [121]:
df["primaryEjecta1"] = df['text'].str.split('/').str[0]
df

Out[121]:
             text primaryEjecta1
0        Rd/MLERS             Rd
1              Rd             Rd
2              Rd             Rd
3  Rd/DLEPC/DLERS             Rd
4           SLERS          SLERS
5           MLERS          MLERS

Upvotes: 2

Related Questions