whitespace
whitespace

Reputation: 13

Python RegEx matching substrings on various conditions

Been struggling with this one for a while now - I simply can't wrap my brain around it.

Given the following string variations:

some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something

I am looking for a RegEx that would produce the following:

{'text': 'some text',
 'link': ''}

{'text': 'some text',
 'link': 'http://a.link.to/something'}

{'text': '',
 'link': 'http://a.link.to/something'}

Cheers!

Upvotes: 0

Views: 58

Answers (2)

Avinash Raj
Avinash Raj

Reputation: 174696

Use named capturing groups in re.match function so that you could be able to create dictionary with user defined keys.

>>> s = '''some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something'''
>>> for i in s.split('\n'):
        re.match(r'(?P<text>(?:(?!http://).)*?)\W*\b(?P<link>http://.*)?$', i).groupdict()


{'link': None, 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': ''}

Upvotes: 3

Federico Piazza
Federico Piazza

Reputation: 30985

You can use a regex like this:

(.+?)(http.*)?$

Working demo

enter image description here

As you can see is not fully achieving what you want for the case of:

some text - http://a.link.to/something

Since it generates:

{'text': 'some text - ',  'link': 'http://a.link.to/something'}
                    ^--- Dash here

But you can do a pre or post clean to the text.

I'm posting the answer since it might help you.

Upvotes: 1

Related Questions