Ashish Kumar
Ashish Kumar

Reputation: 136

extract a pattern from different string in Python

I have few file name :

xyz-1.23.35.10.2.rpm

xyz-linux-version-90.12.13.689.tar.gz

xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm

Here xyz can be any string of any size(only alpha no numerals)

Here the numbers with('.') are a version for each file.

Can I have a one common function to extract the version from each of the filename? I tried but the function is getting too big and very much use of hard coded constants. please suggest a simple way

Upvotes: 1

Views: 193

Answers (3)

QuiteClose
QuiteClose

Reputation: 686

We can use the re module to do this. Let's define the pattern we're trying to match.

We'll need to match a string of digits:

\d+

These digits may be followed by either a period or a hyphen:

\d+[\-\.]?

And this pattern can repeat many times:

(\d[\-\.]?)*

Finally, we always end with at least one digit:

(\d+[\-\.]?)*\d+

This pattern can be used to define a function that returns a version number from a filename:

import re

def version_from(filename, pattern=r'(\d+[\-\.]?)*\d+'):
    match = re.search(pattern, filename)
    if match:
        return match.group(0)
    else:
        return None

Now we can use the function to extract all the versions from the data you provided:

data = ['xyz-1.23.35.10.2.rpm', 'xyz-linux-version-90-12-13-689.tar.gz', 'xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm']

versions = [version_from(filename) for filename in data]

The result is the list you ask for:

['1.23.35.10.2', '90-12-13-689', '13.23.789.0']

Upvotes: 1

Pythonista
Pythonista

Reputation: 11615

Not sure if there's a better way regular expressions aren't really my thing, but here's one way you can see the version of your files assuming the only occurrences of numbers are the versions in this format.

import re
strings = [
    "xyz-1.23.35.10.2.rpm",
    "xyz-linux-version-90.12.13.689.tar.gz",
    "xyz-xyz-xyz-13.23.789.0-xyz-xyz.rpm",
]
for string in strings:
    matches = re.findall("\d+", string)
    version = ".".join(matches)
    print(version)

Result:

1.23.35.10.2
90.12.13.689
13.23.789.0

Upvotes: 1

VlassisFo
VlassisFo

Reputation: 660

Assuming that the only numbers in your string are the version you need to extract, you could try something like this:

 def func(someString):
    version = ''
    found = False 
    for character in someString:
        if character.isdigit():
            found = True
        elif character.isalpha():
            found = False
        if found:
            version += character
    return version

Basically we search each character of the string, and when the version part begins found becomes true (because 'number'.isdigit() returns true). When we reach that part each character is added to the version string. isdigit() and isalpha() are part of python's basic library so you don't need to import anything.

P.S. I haven't tested this for errors

Upvotes: 0

Related Questions