Reputation: 121
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("youtube.htm"))
for link in soup.find_all('img'):
print link.get('src')
file = open("parseddata.txt", "wb")
file.write(link.get('src')+"\n")
file.flush()
Hello, I want to experiment around with BeautifulSoup and parsed some youtube sites. It get ca. 25 lines of links out of this. But if I look the file up there is only the last one written(a small part of it). I tried different open modes, or the file.close() function. But nothing worked. Someone got a clue?
Upvotes: 0
Views: 12928
Reputation: 5876
You are looping through every img tag in this line and printing each one:
for link in soup.find_all('img'):
print link.get('src')
However, you are not writing to the file in that loop, you are just writing link.get('src')+'\n'
at the very end.
This will only write what link is currently assigned to, which is simply the last img tag that you found in your loop above. That is why only one 'src' value will be written to the output file.
You need to be writing each line to the file within your loop that goes through each of the img tags that you are interested in. You'll need to do a little bit of rearranging to do that:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("youtube.htm"))
file = open("parseddata.txt", "wb")
for link in soup.find_all('img'):
print link.get('src')
file.write(link.get('src')+"\n")
file.flush()
file.close()
You should also remember to close the file as I have added in the last line of the above snippet.
Edit: As per Hooked's comment below, here is what this snippet would look like if you use the with
keyword. Using with
will close the file automatically for you as soon as the indented block ends, so that you don't even have to think about it:
from bs4 import BeautifulSoup
soup = BeautifulSoup(open("youtube.htm"))
with open("parseddata.txt", "wb") as file:
for link in soup.find_all('img'):
print link.get('src')
file.write(link.get('src')+"\n")
Upvotes: 5