Reputation: 63
I'm working with a set of data that has names and usernames combined in one string. For example, if a user was named "John Smith" and his ID number was 1234567, the string would be "John Smith --- 1234567". The strings are consistently formatted in such a way that it is always:
NAME [space] 3 HYPHENS [space] ID number
I am trying to find a way to pull ID numbers out of these strings. I found that doing something like this:
foo = "John Smith --- 1234567"
bar = [str(s) for s in foo.split() if s.isdigit()]
I get a list like this ['1234567']. This will work for my needs, but I'm wondering if there's a more "Pythonic"/clean way to do this? Is there a way to just get and int of the ID number returned, as opposed to a list with a string in it?
Upvotes: 1
Views: 904
Reputation: 22107
Appropriate use of regular expressions is "Pythonic":
>>> import re
>>> data = "John Smith --- 1234567"
>>> idtext = re.match(r'.* --- (\d+)$', data).group(1)
>>> int(idtext)
1234567
The regex asks for any sequence, followed by your " --- " marker, followed by digits and then the end of the line. That may be too restrictive, or not restrictive enough, depending on the actual data.
Whether that's appropriate for your situation, and whether you want any error handling to cover possible unexpected conditions, is your call. Note also re.findall() which would let you do this on input that had many of these lines, all at once.
As Brian M. Sheldon commented, using a string split() (or rsplit(), if you're looking for something at the end instead) is also "Pythonic" when it's appropriate, and that would look something like this:
>>> data = "John Smith --- 1234567"
>>> idtext = data.rsplit(' --- ', 1)[1]
>>> int(idtext)
1234567
I showed the regex version first because in my experience, doing this on one line probably means you have a bunch of lines and going from the one-liner to using re.findall() makes this a bit simpler than having to iterate over the lines manually (using a for loop or generator or such) and apply the split to each one.
Upvotes: 2
Reputation: 12927
If I understand your problem correctly…:
id = int(foo.split(' --- ')[-1])
First, your foo
is split into a list two parts - before and after ---
- then the last element of this list, which obviously should be the ID, is converted to int.
Upvotes: 2
Reputation: 4685
You can use filter
and str.isdigit
:
''.join(list(filter(str.isdigit, foo)))
Upvotes: 1
Reputation: 2983
You can use regex for this case
import re
foo = "John Smith --- 1234567"
id = re.search(r'\d+',foo).group()
Upvotes: 0
Reputation: 7409
How about:
bar = [int(s) for s in foo.split() if s.isdigit()]
instead?
Upvotes: 1