Nilani Algiriyage
Nilani Algiriyage

Reputation: 35666

Read file, extract url and re-write -Python

I'm reading text file in the following format(a.txt).

http://www.example.com/forum/showthread.php?t=779689/images/webcard.jpg 121.10.208.31

Then I need to obtain only the www.example.com part with /images/webcard.jpg 121.10.208.31 and write to the same file or a separate one. In this case I'm writing it to b.txt.

from urlparse import urlparse 
f = open('a.txt','r')
fo = open('b','w')


for line in f:
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')

the above code gives the following error?How to achieve this?

    Traceback (most recent call last):
  File "prittyprint.py", line 17, in <module>
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')
IndexError: list index out of range

Upvotes: 0

Views: 168

Answers (1)

Srikar Appalaraju
Srikar Appalaraju

Reputation: 73608

It could be that there are exceptions in your file a.txt. Some line(s) might not have this format. You can try this -

from urlparse import urlparse 

f = open('a.txt','r')
fo = open('b','w')

for line in f:
    split_line = line.split(' ')
    if len(split_line) >=3:
        fo.write(urlparse(line).netloc+ ' ' + split_line[1] + ' ' + split_line[2] + '\n')
    else:
        print "ERROR: some other line: %s" % (line) #continue on with next line

Upvotes: 3

Related Questions