Use regex to get info from a specific text format

Question

I have a text that contains stuff like this :

(some text)
libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;(some text)
libx32ncursesw5 depends on libc6-x32 (>= 2.16);(some text)
libx32ncurses5-dev depends on libncurses5-dev (= 5.9+20150516-2ubuntu1);(some text)
libx32ncursesw5-dev depends on libc6-dev-x32;(some text)
lib32tinfo-dev depends on lib32c-dev;(some text)

Here is a full example on one of the sentences :

dpkg: error processing package lib32tinfo5 (--install):
 dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of libncurses5-dev:amd64:
 libncurses5-dev:amd64 depends on libc6-dev | libc-dev; however:
    Package libc6-dev is not installed.
    Package libc-dev is not installed.

The whole text is divided in several paragraphs such as the one above, each paragraph contains one of those sentences.

I would like a regex using re library in python that would give me something like this using findall option :

('libc6-dev', '', 'libc-dev', '')
('libc6-x32','2.16')
('libncurses5-dev','5.9+20150516-2ubuntu1')
('libc6-dev-x32','')
('lib32c-dev','')

In another words, I would like your help in order to get from such text, a tuple containing the packages with their versions if specified.

I did this regex :

(?<=depends on )([a-zA-Z0-9\-]*)(?: $[=> ]*([a-zA-Z0-9-+.]*)(?:$))?|(?: \| )([a-zA-Z0-9\-]*)(?: $[=> ]*([a-zA-Z0-9-+.]*)(?:$))?(?=;)

I got this result :

('libc6-dev', '', '', '')
('', '', 'libc-dev', '')
('libc6-x32', '2.16', '', '')
('libncurses5-dev', '5.9+20150516-2ubuntu1', '', '')
('libc6-dev-x32', '', '', '')
('lib32c-dev', '', '', '')

As you can see, for the sentence :

libncursesw5-dev:amd64 depends on libc6-dev | libc-dev;

I got this answer :

('libc6-dev', '', '', '')
('', '', 'libc-dev', '')

Rather than this one :

('libc6-dev', '', 'libc-dev', '')

Thank you for your help.

Use regex to get info from a specific text format

Answers (1)

Related Questions