naruto
naruto

Reputation: 53

regex to filter version from string

I have the following strings:

String 1-

Cisco IOS Software, C3900 Software (C3900-UNIVERSALK9-M), Version 15.4(3)M3, RELEASE SOFTWARE (fc2) ROM: System Bootstrap, Version 15.0(1r)M16, RELEASE SOFTWARE (fc1)

String2-

Cisco IOS XE Software, Version 16.05.01b
Cisco IOS Software [Everest], ISR Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 16.5.1b, RELEASE SOFTWARE (fc1)
licensed under the GNU General Public License ("GPL") Version 2.0.  The
software code licensed under GPL Version 2.0 is free software that comes
GPL code under the terms of GPL Version 2.0.  For more details, see the

from both the strings I need to get only 16.05.01b and 15.4(3)M3 when I run Regex.

I have tried this r'((?<=Version\s)\d+\.\d+\(\d+...)' I am able to fetch 15.4(3)M3 not 16.05.01b.

and r'((?<=Version\s)\d+\.\d+\(\d+...)'

one regular expression should be able to fetch the version from both the strings, but both do not give me the result.

Upvotes: 4

Views: 615

Answers (3)

Mattias
Mattias

Reputation: 436

Well that's because your regex expects to find a parenthesis when searching for the version, which is not present in the second string.

This is an easy way to solve it (borrowed the strings from abdusco):

strings = [
    '-M), Version 15.4(3)M3, RELEA',
    'rap, Version 15.0(1r)M16, RELEA',
    ', Version 16.5.1b, RELEASE']

    versions = []
    version = re.compile(r'(?<=Version\s)\d+\.\d........')
        for s in strings:
            v = version.search(s).group(0).split(',')[0]
            version.append(v)

Upvotes: 1

abdusco
abdusco

Reputation: 11081

In your examples a version is prefixed with Version and includes:

  • numbers
  • dots
  • parentheses
  • characters

Here, I model version as something that starts with a number and continues with a combination of the items above.

This should work:

import re
strings = [
    '-M), Version 15.4(3)M3, RELEA',
    'rap, Version 15.0(1r)M16, RELEA',
    ', Version 16.5.1b, RELEASE',
    're, Version 16.05.01b'
]
version_re = re.compile(r'version (\d[\w.()]+)', flags=re.IGNORECASE)
for s in strings:
    v = version_re.search(s).group(1)
    print(v)

output:

15.4(3)M3
15.0(1r)M16
16.5.1b
16.05.01b

Upvotes: 3

The fourth bird
The fourth bird

Reputation: 163277

You could use an alternation to get both the values.

You might also omit the capturing group as it is the only match to match either an opening till closing parenthesis followed by A-Z and a digit or match a dot, 2 digits and a character a-z

(?<=Version\s)\d+\.\d+(?:\([^()+]\)[A-Z]\d|\.\d{2}[a-z])

Regex demo | Python demo

A more efficient version could be using a capturing group instead of the lookbehind:

Version\s(\d+\.\d+(?:\([^()+]\)[A-Z]\d|\.\d{2}[a-z]))

Regex demo

import re

regex = r"(?<=Version\s)\d+\.\d+(?:\([^()+]\)[A-Z]\d|\.\d{2}[a-z])"

test_str = ("String 1-Cisco IOS Software, C3900 Software (C3900-UNIVERSALK9-M), Version 15.4(3)M3, RELEASE SOFTWARE (fc2)\n"
    "ROM: System Bootstrap, Version 15.0(1r)M16, RELEASE SOFTWARE (fc1)\n\n"
    "String2-Cisco IOS XE Software, Version 16.05.01b\n"
    "Cisco IOS Software [Everest], ISR Software (X86_64_LINUX_IOSD-UNIVERSALK9-M), Version 16.5.1b, RELEASE SOFTWARE (fc1)\n"
    "licensed under the GNU General Public License (\"GPL\") Version 2.0.  The\n"
    "software code licensed under GPL Version 2.0 is free software that comes\n"
    "GPL code under the terms of GPL Version 2.0.  For more details, see the")

print (re.findall(regex, test_str))

Result

['15.4(3)M3', '16.05.01b']

Upvotes: 1

Related Questions