Reputation: 63

How to extract an ID number from a list of consistently formatted strings

I'm working with a set of data that has names and usernames combined in one string. For example, if a user was named "John Smith" and his ID number was 1234567, the string would be "John Smith --- 1234567". The strings are consistently formatted in such a way that it is always:

NAME [space] 3 HYPHENS [space] ID number

I am trying to find a way to pull ID numbers out of these strings. I found that doing something like this:

foo = "John Smith --- 1234567"

bar = [str(s) for s in foo.split() if s.isdigit()]

I get a list like this ['1234567']. This will work for my needs, but I'm wondering if there's a more "Pythonic"/clean way to do this? Is there a way to just get and int of the ID number returned, as opposed to a list with a string in it?

Upvotes: 1

Answers (5)

Peter Hansen

Reputation: 22107

Appropriate use of regular expressions is "Pythonic":

>>> import re
>>> data = "John Smith --- 1234567"
>>> idtext = re.match(r'.* --- (\d+)$', data).group(1)
>>> int(idtext)
1234567

The regex asks for any sequence, followed by your " --- " marker, followed by digits and then the end of the line. That may be too restrictive, or not restrictive enough, depending on the actual data.

Whether that's appropriate for your situation, and whether you want any error handling to cover possible unexpected conditions, is your call. Note also re.findall() which would let you do this on input that had many of these lines, all at once.

As Brian M. Sheldon commented, using a string split() (or rsplit(), if you're looking for something at the end instead) is also "Pythonic" when it's appropriate, and that would look something like this:

>>> data = "John Smith --- 1234567"
>>> idtext = data.rsplit(' --- ', 1)[1]
>>> int(idtext)
1234567

I showed the regex version first because in my experience, doing this on one line probably means you have a bunch of lines and going from the one-liner to using re.findall() makes this a bit simpler than having to iterate over the lines manually (using a for loop or generator or such) and apply the split to each one.

Upvotes: 2

Błotosmętek

Reputation: 12927

If I understand your problem correctly…:

id = int(foo.split(' --- ')[-1])

First, your foo is split into a list two parts - before and after --- - then the last element of this list, which obviously should be the ID, is converted to int.

Upvotes: 2

coincoin

Reputation: 4685

You can use filter and str.isdigit:

''.join(list(filter(str.isdigit, foo)))

Upvotes: 1

Logovskii Dmitrii

Reputation: 2983

You can use regex for this case

import re
foo = "John Smith --- 1234567"
id = re.search(r'\d+',foo).group()

Upvotes: 0

TomServo

Reputation: 7409

How about:

bar = [int(s) for s in foo.split() if s.isdigit()]

instead?

Upvotes: 1

How to extract an ID number from a list of consistently formatted strings

Answers (5)

Related Questions