Misha Lavrov
Misha Lavrov

Reputation: 1

How do I parse the elements in this list?

I have a list to parse, (but I am finding a generic way to parse any list like this):

dev-libs/icu-63.1-r1 alpha amd64 arm arm64 ia64 ppc ppc64 x86 hppa s390 dev-libs/icu-layoutex-63.1 alpha amd64 ia64 ppc ppc64 x86 hppa sparc dev-lang/perl-5.28-r1 s390 virtual/ruby_gems-0.3_pre24 amd64 x86

This seems to fall sometimes, because it tries the parse the architectures list like starting with alpha till the end of line, but I really want to ignore everything after a package version but leave the posibility of space existence after a version.

My code is following: (print stuff just for debug)

for line in args.list:
    print(line)
    package_category = re.search(r'((?<==)\w+-\w+|\w+-\w+|\w+)', line).group(0)
    print(package_category)
    package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0)
    print(package_name)
    package_version = re.search(r'(?<=-)\d+.\d-*\w*\s?', line).group(0)

I expect this to do following:

package_category variable should contain a category like:

dev-libs dev-lang virtual

package_name should contain a package name, like:

icu icu-layoutex perl ruby_gems

package_version:

63.1-r1 63.1 0.3_pre24

the rest should be just ignored

currently I suddenly hit the architrctures list somehow with the output:

dev-libs/icu-63.1-r1 dev-libs icu alpha alpha Traceback (most recent call last): File "./repomator.py", line 47, in <module> package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0) AttributeError: 'NoneType' object has no attribute 'group'

Upvotes: 0

Views: 81

Answers (1)

Toto
Toto

Reputation: 91385

Is that what you want:

(?P<category>\w+(?:-\w+)?)/(?P<name>[a-z]+(?:[-_][a-z]+)?)-(?P<version>\S+)

Demo

Explanation:

(?<category>            # named group category
  \w+                   # 1 or more word character
  (?:-\w+)?             # optional, a dash then 1 or more word character
)                       # end group
/                       # a slash
(?<name>                # named group name
  [a-z]+                # 1 or more alpha
  (?:[-_][a-z]+)?       # optional, dash or underscore and 1 or more alpha
)                       # end group
-                       # a dash
(?<version>             # named group version
  \S+                   # 1 or more non space character
)                       # end group

code:

import re

list = [
'dev-libs/icu-63.1-r1 alpha amd64 arm arm64 ia64 ppc ppc64 x86 hppa s390 ',
'dev-libs/icu-layoutex-63.1 alpha amd64 ia64 ppc ppc64 x86 hppa sparc',
'dev-lang/perl-5.28-r1 s390',
'virtual/ruby_gems-0.3_pre24 amd64 x86'
]
for line in list:
    res = re.search(r'(?P<category>\w+(?:-\w+)?)/(?P<name>[a-z]+(?:[-_][a-z]+)?)-(?P<version>\S+)', line)
    print "cat: ",res.group('category'),"\t  name: ",res.group('name'), "\t\tversion: ",res.group('version')

Output:

cat:  dev-libs    name:  icu        version:  63.1-r1
cat:  dev-libs    name:  icu-layoutex       version:  63.1
cat:  dev-lang    name:  perl       version:  5.28-r1
cat:  virtual     name:  ruby_gems      version:  0.3_pre24

Upvotes: 1

Related Questions