Reputation: 39
Say I have a column foo in a dataframe df, which looks like:
0 abc1
1 def
2 g3sse1
3 f32asd
I do not want the number at the end, if there is any.
0 abc
1 def
2 g3sse
3 f32asd
Like this.
The best I can do is:
df.foo[df['foo'].str[-1].str.isdigit()] = df['foo'].str[:-1]
This solves the problem, but... I am curious if there is more elegant way to do this. I guess regex won't make it look any better, but I appreciate any ideas!
Upvotes: 2
Views: 794
Reputation: 13821
Since your input only contains trailing numbers, and in this case you don't want to use regular expressions, you can also use rstrip
and python's string
module:
import string
df['foo_refined'] = df['foo'].str.rstrip(string.digits)
foo foo_refined
0 abc1 abc
1 def def
2 g3sse1 g3sse
3 f32asd f32asd
a = '12a'
>>> a.rstrip(string.digits)
'12a'
b = '12a2'
>>> b.rstrip(string.digits)
'12a'
c = '12a12x'
>>> c.rstrip(string.digits)
'12a12x'
d = '123'
>>> d.rstrip(string.digits)
''
And a reference to lstrip
, which as expected would strip any digits from the start not from the end if used in this context.
Upvotes: 5
Reputation: 78690
Your solution is fine. An alternative is:
df['foo_new'] = df['foo'].str.extract('(.*)\d$').fillna(df['foo'])
Upvotes: 3