EJ Kang
EJ Kang

Reputation: 495

Python dataframe: simple string split that includes '-'

Happy New Year guys. I have a Dataframe that contains int and strings in each columns. In my string column, some of my value contains '-' in the middle and I want to delete any string that follows after '-'. Take a look at my df below.

input:
    buzz_id     facet   facet_cls   facet_val   p_buzz_date
0   95713207    A3      Small           MN        20160101
1   95713207    S3      Small-box       Tbd       20160101
2   95713207    F1      Medium          es        20160101
3   95713207    A2      Medium-box      esf       20160101
4   95713207    A1      Dum-pal         ess       20160101
...


output:
    buzz_id     facet   facet_cls   facet_val   p_buzz_date
0   95713207    A3      Small           MN        20160101
1   95713207    S3      Small           Tbd       20160101
2   95713207    F1      Medium          es        20160101
3   95713207    A2      Medium          esf       20160101
4   95713207    A1      Dum             ess       20160101
...

So in my 'facet_cls' columns, anything that goes after '-' (including '-') need to be deleted. Also my data itself is very big, so I was hoping to use fastest process that I can find. Any ideas?

Thanks in advance!

Upvotes: 3

Views: 91

Answers (2)

Sociopath
Sociopath

Reputation: 13426

You can also do it using lambda expression as follow:

df['facet_cls'] = df['facet_cls'].apply(lambda x:x.split('-')[0])

Upvotes: 1

jezrael
jezrael

Reputation: 863531

Use split and then select only first values of lists by str[0]:

df['facet_cls'] = df['facet_cls'].str.split('-').str[0]
print (df)
    buzz_id facet facet_cls facet_val  p_buzz_date
0  95713207    A3     Small        MN     20160101
1  95713207    S3     Small       Tbd     20160101
2  95713207    F1    Medium        es     20160101
3  95713207    A2    Medium       esf     20160101
4  95713207    A1       Dum       ess     20160101

Detail:

print (df['facet_cls'].str.split('-'))
0          [Small]
1     [Small, box]
2         [Medium]
3    [Medium, box]
4       [Dum, pal]
Name: facet_cls, dtype: object

Upvotes: 2

Related Questions