Treeboy
Treeboy

Reputation: 39

Dataframe checking if the string ends with a number and remove

Say I have a column foo in a dataframe df, which looks like:

0            abc1
1             def
2           g3sse1
3           f32asd

I do not want the number at the end, if there is any.

0             abc
1             def
2            g3sse
3           f32asd

Like this.

The best I can do is:

df.foo[df['foo'].str[-1].str.isdigit()] = df['foo'].str[:-1]

This solves the problem, but... I am curious if there is more elegant way to do this. I guess regex won't make it look any better, but I appreciate any ideas!

Upvotes: 2

Views: 794

Answers (2)

sophocles
sophocles

Reputation: 13821

Since your input only contains trailing numbers, and in this case you don't want to use regular expressions, you can also use rstrip and python's string module:

import string
df['foo_refined'] = df['foo'].str.rstrip(string.digits)

      foo foo_refined
0    abc1         abc
1     def         def
2  g3sse1       g3sse
3  f32asd      f32asd

a = '12a'
>>> a.rstrip(string.digits)
'12a'

b = '12a2'
>>> b.rstrip(string.digits)
'12a'

c = '12a12x'
>>> c.rstrip(string.digits)
'12a12x'

d = '123'
>>> d.rstrip(string.digits)
''

And a reference to lstrip, which as expected would strip any digits from the start not from the end if used in this context.

Upvotes: 5

timgeb
timgeb

Reputation: 78690

Your solution is fine. An alternative is:

df['foo_new'] = df['foo'].str.extract('(.*)\d$').fillna(df['foo'])

Upvotes: 3

Related Questions