Reputation: 21
I have this dataset where there is a column named 'Discount' and the values are given as '20% off', '25% off' etc.
What i want is to keep just the number in the column and remove the % symbol and the 'off' string.
I'm using this formula to do it.
df['discount'] = df['discount'].apply(lambda x: x.lstrip('%').rstrip('off')
However, when i apply that formula, all the values in the column 'discount' becomes "nan".
I even used this formula as well,
df['discount'] = df['discount'].str.replace('off' , '')
However, this does the same thing.
Is there any other way of handling this? I just want to make all the values in that column to be just the number which is like 25, 20, 10 and get rid of that percentage sign and the string value.
Upvotes: 1
Views: 513
Reputation: 260380
If you have a fixed % off
suffix, the most efficient is to just remove the last 5 characters:
d['discount'] = d['discount'].str[:-5].astype(int)
Upvotes: 1
Reputation: 71
I came up with this solution:
d['discount'] = d['discount'].split("%")[0]
or as int:
d['discount'] = int(d['discount'].split("%")[0])
We chop the string in two pieces at the %-sign and then take the first part, the number.
Upvotes: 1
Reputation:
Try this:
d['discount'] = d['discount'].str.replace(r'(%|\s*off)', '', regex=True).astype(int)
Output:
>>> df
discount
0 20
1 25
Upvotes: 1