shash
shash

Reputation: 511

Search for a string in a line using regex

I am searching for a string in the format XXXXX_XXXXX or XXXXXX_XXXXX or XXXXXX in a line, where X is alphanumeric.

So the string before "_" is 5 or 6 characters long and the string after "_" is always five or may be just 6 characters long without any underscore. I am coding in Python.

Any help will be much appreciated.

Upvotes: 0

Views: 444

Answers (5)

bdeniker
bdeniker

Reputation: 1045

I quite like Michał Šrajer's answer, but, as has been pointed out, his version also matches just 5 alnum characters (which we don't want).

Here's an edit of his version to compensate for that:

re.match("[a-zA-Z0-9]{5}(([a-zA-Z0-9]?_[a-zA-Z0-9]{5})?|[a-zA-Z0-9])", c)

Though some of the other answers are probably more readable...

Upvotes: 0

Michał Šrajer
Michał Šrajer

Reputation: 31192

import re and then:

re.match("[a-zA-Z0-9]{5,6}(_[a-zA-Z0-9]{5})?", c).group()

Note, that predefined \w gets "_" as alphanum, so you cannot use it here.

Upvotes: 1

user557597
user557597

Reputation:

If Python doesen't assume start and end boundry conditions as a default,
or, if searching for a string in a string, you may have to account for boundry conditions.
Otherwise, XXXXXXXXXXXXXXXXXXXXXX_XXXXXXXXXXXXXXXXXXXXXXX will be matched as well.

/ (?: ^ | [\W_] )              # beginning of line or non-alphameric
  (?:
       [^\W_]{5,6}_[^\W_]{5}   # 5-6 alphameric's, underscore, 5 alphameric's
    |  [^\W_]{6}               # or, 6 alphameric's
  )
  (?: [\W_] | $)               # non-alphameric or end of line
/

Upvotes: 0

jwd
jwd

Reputation: 11144

Howabout this?

([a-zA-Z0-9]{5,6}_[a-zA-Z0-9]{5})|[a-zA-Z0-9]{6}

Full code example:

import re
pat = re.compile(r'^(([a-zA-Z0-9]{5,6}_[a-zA-Z0-9]{5})|[a-zA-Z0-9]{6})$')
print pat.match('xxxxx_xxxxx') is not None    # True, 5 chars, underscore, 5 chars
print pat.match('xxxxxx_xxxxx') is not None    # True, 6 chars, underscore, 5 chars
print pat.match('xxxxxx') is not None    # True, 6 chars

NOTE: I previously wrote this, not realizing python doesn't support POSIX character classes

([[:alnum:]]{5,6}_[[:alnum:]]{5})|[[:alnum:]]{6}

Upvotes: 3

talnicolas
talnicolas

Reputation: 14051

import re

regex = re.compile("[[:alnum:]]{5,6}_[[:alnum:]]{5})|[[:alnum:]]{6}")
here = re.search(regex, "your string")
if here:
     #pattern has been found

Upvotes: 0

Related Questions