Reputation: 1
I have a list to parse, (but I am finding a generic way to parse any list like this):
dev-libs/icu-63.1-r1 alpha amd64 arm arm64 ia64 ppc ppc64 x86 hppa s390
dev-libs/icu-layoutex-63.1 alpha amd64 ia64 ppc ppc64 x86 hppa sparc
dev-lang/perl-5.28-r1 s390
virtual/ruby_gems-0.3_pre24 amd64 x86
This seems to fall sometimes, because it tries the parse the architectures list like starting with alpha
till the end of line, but I really want to ignore everything after a package version but leave the posibility of space existence after a version.
My code is following: (print stuff just for debug)
for line in args.list:
print(line)
package_category = re.search(r'((?<==)\w+-\w+|\w+-\w+|\w+)', line).group(0)
print(package_category)
package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0)
print(package_name)
package_version = re.search(r'(?<=-)\d+.\d-*\w*\s?', line).group(0)
I expect this to do following:
package_category variable should contain a category like:
dev-libs
dev-lang
virtual
package_name should contain a package name, like:
icu
icu-layoutex
perl
ruby_gems
package_version:
63.1-r1
63.1
0.3_pre24
the rest should be just ignored
currently I suddenly hit the architrctures list somehow with the output:
dev-libs/icu-63.1-r1
dev-libs
icu
alpha
alpha
Traceback (most recent call last):
File "./repomator.py", line 47, in <module>
package_name = re.search(r'(?<=/)[a-z]+.[a-z]+', line).group(0)
AttributeError: 'NoneType' object has no attribute 'group'
Upvotes: 0
Views: 81
Reputation: 91385
Is that what you want:
(?P<category>\w+(?:-\w+)?)/(?P<name>[a-z]+(?:[-_][a-z]+)?)-(?P<version>\S+)
Explanation:
(?<category> # named group category
\w+ # 1 or more word character
(?:-\w+)? # optional, a dash then 1 or more word character
) # end group
/ # a slash
(?<name> # named group name
[a-z]+ # 1 or more alpha
(?:[-_][a-z]+)? # optional, dash or underscore and 1 or more alpha
) # end group
- # a dash
(?<version> # named group version
\S+ # 1 or more non space character
) # end group
code:
import re
list = [
'dev-libs/icu-63.1-r1 alpha amd64 arm arm64 ia64 ppc ppc64 x86 hppa s390 ',
'dev-libs/icu-layoutex-63.1 alpha amd64 ia64 ppc ppc64 x86 hppa sparc',
'dev-lang/perl-5.28-r1 s390',
'virtual/ruby_gems-0.3_pre24 amd64 x86'
]
for line in list:
res = re.search(r'(?P<category>\w+(?:-\w+)?)/(?P<name>[a-z]+(?:[-_][a-z]+)?)-(?P<version>\S+)', line)
print "cat: ",res.group('category'),"\t name: ",res.group('name'), "\t\tversion: ",res.group('version')
Output:
cat: dev-libs name: icu version: 63.1-r1
cat: dev-libs name: icu-layoutex version: 63.1
cat: dev-lang name: perl version: 5.28-r1
cat: virtual name: ruby_gems version: 0.3_pre24
Upvotes: 1