Reputation: 13
Been struggling with this one for a while now - I simply can't wrap my brain around it.
Given the following string variations:
some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something
I am looking for a RegEx that would produce the following:
{'text': 'some text',
'link': ''}
{'text': 'some text',
'link': 'http://a.link.to/something'}
{'text': '',
'link': 'http://a.link.to/something'}
Cheers!
Upvotes: 0
Views: 58
Reputation: 174696
Use named capturing groups in re.match function so that you could be able to create dictionary with user defined keys.
>>> s = '''some text
some text http://a.link.to/something
some text - http://a.link.to/something
some text: http://a.link.to/something
http://a.link.to/something'''
>>> for i in s.split('\n'):
re.match(r'(?P<text>(?:(?!http://).)*?)\W*\b(?P<link>http://.*)?$', i).groupdict()
{'link': None, 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': 'some text'}
{'link': 'http://a.link.to/something', 'text': ''}
Upvotes: 3
Reputation: 30985
You can use a regex like this:
(.+?)(http.*)?$
As you can see is not fully achieving what you want for the case of:
some text - http://a.link.to/something
Since it generates:
{'text': 'some text - ', 'link': 'http://a.link.to/something'}
^--- Dash here
But you can do a pre or post clean to the text.
I'm posting the answer since it might help you.
Upvotes: 1