Reputation: 553
I have a string:
<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">
I want to get the value of "generated", but with below code, it doesn't work
import re
doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
match = re.match(r'generated="(\d+ \d+:\d+:\d+.\d+)',doc)
the value of match is none. can anyone help?
Upvotes: 2
Views: 412
Reputation: 369424
re.match
matches only at the beginning of the string. Use re.search
instead which matches not only at the beginning, but matches anywhere.
>>> import re
>>> doc=r'<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc)
<_sre.SRE_Match object at 0x1010505d0>
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group()
'generated="20170330 17:19:11.956'
>>> re.search(r'generated="(\d+ \d+:\d+:\d+\.\d+)',doc).group(1)
'20170330 17:19:11.956'
See search() vs. match() from re module documentation
Upvotes: 2
Reputation: 474161
You don't necessarily need regular expressions in this case. Here is an alternative idea that uses BeautifulSoup
XML/HTML parser with dateutil
datetime parser:
In [1]: from dateutil.parser import parse
In [2]: from bs4 import BeautifulSoup
In [3]: data = '<robot generated="20170330 17:19:11.956" generator="Robot 3.0.2 (Python 2.7.13 on win32)">'
In [4]: parse(BeautifulSoup(data, "html.parser").robot['generated'])
Out[4]: datetime.datetime(2017, 3, 30, 17, 19, 11, 956000)
I find this approach beautiful, easy and straightforward.
Upvotes: 1