Reputation: 419
I have written this code to replace urls with their titles. It does replace urls with titles as required but it prints their titles in next line.
twfile.txt contains these lines:
link1 http://t.co/HvKkwR1c
no link line
Output tw2file:
link1
Instagram
no link line
but i want output in this form:
link1 Instagram
no link line
What should i do?
My Code:
from bs4 import BeautifulSoup
import urllib
output = open('tw2file.txt','w')
with open('twfile.txt','r') as inputf:
for line in inputf:
try:
list1 = line.split(' ')
for i in range(len(list1)):
if "http" in list1[i]:
##print list1[i]
response = urllib.urlopen(list1[i])
html = response.read()
soup = BeautifulSoup(html)
list1[i] = soup.html.head.title
##print list1[i]
list1[i] = ''.join(ch for ch in list1[i])
else:
list1[i] = ''.join(ch for ch in list1[i])
line = ' '.join(list1)
print line
output.write(line)
except:
pass
inputf.close()
output.close()
Upvotes: 0
Views: 155
Reputation: 6186
Try this code: (see here, here, and here)
from bs4 import BeautifulSoup
import urllib
with open('twfile.txt','r') as inputf, open('tw2file.txt','w') as output:
for line in inputf:
try:
list1 = line.split(' ')
for i in range(len(list1)):
if "http" in list1[i]:
response = urllib.urlopen(list1[i])
html = response.read()
soup = BeautifulSoup(html)
list1[i] = soup.html.head.title
list1[i] = ''.join(ch for ch in list1[i]).strip() # here
else:
list1[i] = ''.join(ch for ch in list1[i]).strip() # here
line = ' '.join(list1)
print line
output.write('{}\n'.format(line)) # here
except:
pass
BTW, you are using Python 2.7.x +
, two open
s expressed in the same with
clause. Also their close
s are unnecessary.
Upvotes: 1
Reputation: 3340
Regarding the content written to a file
fileobject = open("bar", 'w' )
fileobject.write("Hello, World\n") # newline is inserted by '\n'
fileobject.close()
Regarding console output
Change print line
to print line,
Python writes the '\n' character at the end, unless the print statement ends with a comma.
Upvotes: 1