Stefano Pozzi
Stefano Pozzi

Reputation: 619

Read URLs from .txt file Python

I am trying to extract URLs from a .txt file using regex (all the URLs end with .jpeg). This is my regex:

import re
output = re.findall('(http)(.*?)(jpeg)', text)

but my output looks like this:

('http', ://d1spq65clhrg1f.cloudfront.net/uploads/image_request/image/182/182382/182382534/cloudsight.', 'jpeg')

How can I avoid having the commas dividing the matches?

Upvotes: 0

Views: 573

Answers (4)

ergesto
ergesto

Reputation: 397

import re 

with open("urls.txt") as f:
    urls = re.findall('(http*.*?jpeg)', f.read())
    print urls

Upvotes: 0

Vasily Tomilchik
Vasily Tomilchik

Reputation: 61

Try this

import re
output = re.findall('(http.*?jpeg)', text)

Output:

['http://d1spq65clhrg1f.cloudfront.net/uploads/image_request/image/182/182382/182382534/cloudsight.jpeg']

This will make "re.findall" to capture only one group - "http.*?jpeg", not three as in your regex.

Upvotes: 1

Jay Shankar Gupta
Jay Shankar Gupta

Reputation: 6088

output = re.findall('https?:.*?.jpeg', text)

Example

import re
text=" asdd adf sdf sf http://d1spq65clhrg1f.cloudfront.net/uploads/image_request/image/182/182382/182382534/cloudsight.jpeg asfd ads f ads asdfadfasf asd asdf asdf asdf as"
output = re.findall('https?:.*?.jpeg', text)
print(output)

Ouput:

['http://d1spq65clhrg1f.cloudfront.net/uploads/image_request/image/182/182382/182382534/cloudsight.jpeg']

Upvotes: 0

Macintosh_89
Macintosh_89

Reputation: 708

I am not sure if you are looking for this

import re
output = " ".join(re.findall('(http)(.*?)(jpeg)', text))

Upvotes: 0

Related Questions