Reputation: 495
Happy New Year guys. I have a Dataframe that contains int and strings in each columns. In my string column, some of my value contains '-' in the middle and I want to delete any string that follows after '-'. Take a look at my df below.
input:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small-box Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium-box esf 20160101
4 95713207 A1 Dum-pal ess 20160101
...
output:
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
...
So in my 'facet_cls' columns, anything that goes after '-' (including '-') need to be deleted. Also my data itself is very big, so I was hoping to use fastest process that I can find. Any ideas?
Thanks in advance!
Upvotes: 3
Views: 91
Reputation: 13426
You can also do it using lambda expression as follow:
df['facet_cls'] = df['facet_cls'].apply(lambda x:x.split('-')[0])
Upvotes: 1
Reputation: 863531
Use split
and then select only first values of lists by str[0]
:
df['facet_cls'] = df['facet_cls'].str.split('-').str[0]
print (df)
buzz_id facet facet_cls facet_val p_buzz_date
0 95713207 A3 Small MN 20160101
1 95713207 S3 Small Tbd 20160101
2 95713207 F1 Medium es 20160101
3 95713207 A2 Medium esf 20160101
4 95713207 A1 Dum ess 20160101
Detail:
print (df['facet_cls'].str.split('-'))
0 [Small]
1 [Small, box]
2 [Medium]
3 [Medium, box]
4 [Dum, pal]
Name: facet_cls, dtype: object
Upvotes: 2