MikeT
MikeT

Reputation: 65

Catching a word with re.findall

I'm trying to build a dictionary with Names:IDs from a web page that have a list of accounts. For each account the HTML code looks like this:

<a href="javascript:UserWindow('Mike','14','0','27s', 'profile')">
<a href="user.php?id=14">Mike</a>

The python code that I've build:

IDs = re.findall('(?<=user.php\?id=\")\w+(?=\">)', src)
names = re.findall('(?<=UserWindow\(\')\w+(?=\',\')', src)

For the names works perfectly but for IDs it doesnt work and I don't know to do it. Any ideas? Thanks in advance.

Upvotes: 1

Views: 89

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89584

An other way using finditer:

#!/usr/bin/python
import re

html = '''<a href="javascript:UserWindow('Mike','14','0','27s', 'profile')">
<a href="javascript:UserWindow('Dorothee','184','0','27s', 'profile')">
<a href="javascript:UserWindow('Esmeralda','7','0','27s', 'profile')">
<a href="javascript:UserWindow('Monique','16','0','27s', 'profile')">'''

pattern = r'UserWindow\(\'([^\']+)\',\'([^\']+)'

users = {}

matches = re.finditer(pattern, html)

for m in matches:
    users[m.group(1)] = m.group(2)

print users

Upvotes: 0

thefourtheye
thefourtheye

Reputation: 239573

Your id doesn't seem to start with ". So, use this

IDs = re.findall('(?<=user.php\?id=)\w+(?=\">)', src)

Upvotes: 2

Related Questions