Reputation: 3487
I need some help with regexp in Python. I have string such as:
17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1 2013-10-19
How can I get this list?
['http://example1.com/viewtopic.php?f=8&t=189', 'http://example2.com', 'http://example3.com/threads/example-text-in-url.27304/']
Upvotes: 0
Views: 100
Reputation: 2291
just try this. maybe it fit your needs :)
Regex
/^(.*;)/gm
String
17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1 2013-10-19
Matches
1. [0-66] `17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;`
2. [87-129] `17:22:32;http://example2.com;example2.com;`
3. [151-228] `20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com
Upvotes: 1
Reputation:
I'm going to give a Regex solution since that is what you asked for. Basically, all you need to do is capture text between http://
and ;
. Below is a demonstration:
from re import findall
mystr = """
17:25:31;http://example1.com/viewtopic.php?f=8&t=189;example1.com;127.0.0.1 2013-10-19
17:22:32;http://example2.com;example2.com;127.0.0.1 2013-10-19
20:18:28;http://example3.com/threads/example-text-in-url.27304/;example3.com;127.0.0.1 2013-10-19
"""
print findall("(http://.+?);", mystr)
Output:
['http://example1.com/viewtopic.php?f=8&t=189', 'http://example2.com', 'http://example3.com/threads/example-text-in-url.27304/']
Upvotes: 1
Reputation: 55215
You don't need regex here, use a csv
parser.
Assuming your data is in a file called data.csv
:
import csv
reader = csv.reader(open("data.csv"), delimiter=";")
referers = [line[1] for line in reader]
Upvotes: 3