Reputation: 23
I have a data frame with a column that includes any combination of one or many variables, separated by a '/' delimiter, e.g.:
Rd/MLERS
Rd
Rd
Rd/DLEPC/DLERS
SLERS
MLERS
Etc., etc. I want to extract the primary classifier, i.e.: the only or the first variable immediately preceding the first '/' character. I don't have a lot of experience with str.extract and my effort -
df["primaryEjecta1"] = df["MORPHOLOGY_EJECTA_1"].str.extract('(.*)/', expand=True)
does not work as anticipated -
Rd
NaN
NaN
Rd/DLEPC
NaN
NaN
Specifically -
Sure this simple to fix if you know how - but most of the examples and tutorials that I have been able to find on-line assume nice, neat delimiters that are not repeated - so appreciate any help that you guys can offer.
Upvotes: 2
Views: 1048
Reputation: 210912
you can use powerful extract() method:
In [31]: df
Out[31]:
txt
0 Rd/MLERS
1 Rd
2 Rd
3 Rd/DLEPC/DLERS
4 SLERS
5 MLERS
In [32]: df['clsfr'] = df['txt'].str.extract(r'([^\/]+)', expand=True)
In [33]: df
Out[33]:
txt clsfr
0 Rd/MLERS Rd
1 Rd Rd
2 Rd Rd
3 Rd/DLEPC/DLERS Rd
4 SLERS SLERS
5 MLERS MLERS
Explanation:
RegEx ([^\/]+)
- means take anything except /
(and until the first occurrence of /
) into the first group
Upvotes: 1
Reputation: 394189
use str.split
and str[0]
to access the first split, this will still return the initial string even without the separator:
In [121]:
df["primaryEjecta1"] = df['text'].str.split('/').str[0]
df
Out[121]:
text primaryEjecta1
0 Rd/MLERS Rd
1 Rd Rd
2 Rd Rd
3 Rd/DLEPC/DLERS Rd
4 SLERS SLERS
5 MLERS MLERS
Upvotes: 2