Reputation: 85
I am trying to parse regular expression in Python and am assigning the value of the parsed string to 2 variables.
For instance if I have a string
<tr align="right"><td>1</td><td>Michael</td><td>Jessica</td>
I want to assign value 1 to an integer variable called rank and values[Michael, Jessica] to an array called name.
When I parse using re.search()
and assign value using .group()
function the type of the variables assigned is _sre.SRE_Match . Can you please help me on how to convert it to integer and string formats respectively.
Upvotes: 2
Views: 7988
Reputation: 369394
The following line:
rank = re.search('(\d)+', line)
should be replaced with:
rank = re.search(r'\d+', line).group() # (..) is not needed
to get a string.
If you want int
object, use int
:
rank = int(re.search(r'\d+', line).group())
BTW, using re.findall
, your program can be simplified.
import re
def extract_rankname(line):
groups = re.findall('<td>(.*?)</td>', line)
try:
rank = groups[0] # int(groups[0])
return {rank: groups[1:]}
except ValueError:
return {} # return None
extract_rankname('<tr align="right"><td>1</td><td>Michael</td><td>Jessica</td>')
# => {'1': ['Michael', 'Jessica']}
Alternatively, instead of using regular expression, when parsing HTML, it's better to use library like BeatufiulSoup, lxml.
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup('<tr align="right"><td>1</td><td>Michael</td><td>Jessica</td>', 'lxml')
>>> [td.text for td in soup.find_all('td')]
[u'1', u'Michael', u'Jessica']
>>> tds = [td.text for td in soup.find_all('td')]
>>> tds[0], tds[1:]
(u'1', [u'Michael', u'Jessica'])
>>> print(tds[0]) # rank
1
>>> tds[1:] # names
[u'Michael', u'Jessica']
Upvotes: 5
Reputation: 10476
You can try this :
<td>(\w+)<\/td>
Then iterate through the matches and assigned to array or variable ...
Sample Code (Run it here):
import re
regex = r"<td>(\w+)<\/td>"
test_str = "<tr align=\"right\"><td>1</td><td>Michael</td><td>Jessica</td>"
values=[]
matches = re.finditer(regex, test_str)
for match in matches:
if match.group(1).isdigit():
rank = int(match.group(1))
else:
values.append(match.group(1))
print(rank)
print(values)
Upvotes: 1