Swat1999
Swat1999

Reputation: 21

How to strip values from columns

I have this dataset where there is a column named 'Discount' and the values are given as '20% off', '25% off' etc.

What i want is to keep just the number in the column and remove the % symbol and the 'off' string.

I'm using this formula to do it.

df['discount'] = df['discount'].apply(lambda x: x.lstrip('%').rstrip('off')

However, when i apply that formula, all the values in the column 'discount' becomes "nan".

I even used this formula as well,

df['discount'] = df['discount'].str.replace('off' , '')

However, this does the same thing.

Is there any other way of handling this? I just want to make all the values in that column to be just the number which is like 25, 20, 10 and get rid of that percentage sign and the string value.

Upvotes: 1

Views: 513

Answers (3)

mozway
mozway

Reputation: 260380

If you have a fixed % off suffix, the most efficient is to just remove the last 5 characters:

d['discount'] = d['discount'].str[:-5].astype(int)

Upvotes: 1

stacked
stacked

Reputation: 71

I came up with this solution:

d['discount'] = d['discount'].split("%")[0]

or as int:

d['discount'] = int(d['discount'].split("%")[0])

We chop the string in two pieces at the %-sign and then take the first part, the number.

Upvotes: 1

user17242583
user17242583

Reputation:

Try this:

d['discount'] = d['discount'].str.replace(r'(%|\s*off)', '', regex=True).astype(int)

Output:

>>> df
  discount
0       20
1       25

Upvotes: 1

Related Questions