Reputation: 65
I'm trying to build a dictionary with Names:IDs from a web page that have a list of accounts. For each account the HTML code looks like this:
<a href="javascript:UserWindow('Mike','14','0','27s', 'profile')">
<a href="user.php?id=14">Mike</a>
The python code that I've build:
IDs = re.findall('(?<=user.php\?id=\")\w+(?=\">)', src)
names = re.findall('(?<=UserWindow\(\')\w+(?=\',\')', src)
For the names works perfectly but for IDs it doesnt work and I don't know to do it. Any ideas? Thanks in advance.
Upvotes: 1
Views: 89
Reputation: 89584
An other way using finditer:
#!/usr/bin/python
import re
html = '''<a href="javascript:UserWindow('Mike','14','0','27s', 'profile')">
<a href="javascript:UserWindow('Dorothee','184','0','27s', 'profile')">
<a href="javascript:UserWindow('Esmeralda','7','0','27s', 'profile')">
<a href="javascript:UserWindow('Monique','16','0','27s', 'profile')">'''
pattern = r'UserWindow\(\'([^\']+)\',\'([^\']+)'
users = {}
matches = re.finditer(pattern, html)
for m in matches:
users[m.group(1)] = m.group(2)
print users
Upvotes: 0
Reputation: 239573
Your id
doesn't seem to start with "
. So, use this
IDs = re.findall('(?<=user.php\?id=)\w+(?=\">)', src)
Upvotes: 2