Sepatau
Sepatau

Reputation: 103

Python Regex for Numeric Pattern that has Dashes

I have a column in a pandas data frame called sample_id. Each entry contains a string, from this string I'd like to pull a numeric pattern that will have one of two forms

1-234-5-6789

or

123-4-5648

I'm having trouble defining the correct regex pattern for this. So far I have been experimenting with the following:

re.findall(pattern=r'\b2\w+', string=str(data['sample_id']))

But this is only pulling values that are starting with 2 and only the first chunk of the numeric pattern. How do I express the above patterns with the dashes?

Upvotes: 1

Views: 3526

Answers (3)

The fourth bird
The fourth bird

Reputation: 163352

You could match an optional part (?:\d-)? to match 1 digit and a hypen, followed by \d{3}-\d-\d{4} which will match the pattern of the digits for both the examples.

(?:\d-)?\d{3}-\d-\d{4}

Regex demo

Instead of using a word boundary \b, if there can not be a non whitespace character before your value, you could prepend the regex with (?<!\S) and if there can not be a non whitespace character after you could add (?!\S) at the end.

Upvotes: 1

Richard
Richard

Reputation: 61289

A vertical pipe | makes an OR in a regular expression, so you can use:

test1='123-4-5648'
test2='1-234-5-6789'

re.findall(pattern=r'[0-9]-[0-9]{3}-[0-9]-[0-9]{4}|[0-9]{3}-[0-9]-[0-9]{4}', string=test1)
re.findall(pattern=r'[0-9]-[0-9]{3}-[0-9]-[0-9]{4}|[0-9]{3}-[0-9]-[0-9]{4}', string=test2)

[0-9] matches a single digit in the range 0 through 9 (inclusive), {4} indicates that four such digits should occur in a row, - means a hyphen, and | means an OR and separates the two patterns you mention.

Upvotes: 1

Arihant Bansal
Arihant Bansal

Reputation: 166

If there will only a maximum of one hyphen between two numbers then, ^[0-9]+(-[0-9]+)+$ would work well. It uses the normal*(special normal*)* pattern where normal is [0-9] and special is -.

Upvotes: 0

Related Questions