Reputation: 1288
I have a series of strings in a dataframe, and I want to get rid of everything in the string once a number starts. Here's an example:
sstrings['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']
so, in the end, I want it to be:
sstrings['abc', 'def', 'dab', 'gh']
I thought about doing something like:
df['sstrings'] = df['sstrings'].str.split()
but since the leading number isn't always the same, I'm not sure how to make that work.
I saw this but that doesn't seem to work with a series.
Is there a way to do this without looping through the series and using re.split
?
Upvotes: 1
Views: 496
Reputation: 44112
In case, the final part of each string consists only from numbers, you can use:
>>> lst = ['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']
>>> map(lambda txt: txt.rstrip("0123456789"), lst)
['abc', 'def', 'dab', 'gh']
or using list comprehension:
>>> [txt.rstrip("0123456789") for txt in lst]
['abc', 'def', 'dab', 'gh']
Upvotes: 0
Reputation: 78740
You could use a regular expression. Demo:
>>> import re
>>> s = ['abc12390859', 'def1959836', 'dab3496876', 'gh34643267']
>>> ss = [re.match(r'[^\d]+', x).group(0) for x in s]
>>> ss
['abc', 'def', 'dab', 'gh']
Explanation:
\d
matches any digit.
[^\d]
matches anything that is not a digit
[^\d]+
matches a sequence of one or more non-digits.
The documentation for re.match
can be found here. It will return a MatchObject
(from which we extract the matching string with group
) if zero or more characters at the beginning of the string match our pattern [^\d]+
. re.match
is applied to all x
in your original list s
with a list comprehension.
Upvotes: 3