BeautifulSoup, parsing and writing the data in a text file

Question

from bs4 import BeautifulSoup


soup = BeautifulSoup(open("youtube.htm"))

for link in soup.find_all('img'):
    print  link.get('src')



file = open("parseddata.txt", "wb")
file.write(link.get('src')+"
")
file.flush()

Hello, I want to experiment around with BeautifulSoup and parsed some youtube sites. It get ca. 25 lines of links out of this. But if I look the file up there is only the last one written(a small part of it). I tried different open modes, or the file.close() function. But nothing worked. Someone got a clue?

mattgmg1990 · Accepted Answer

You are looping through every img tag in this line and printing each one:

for link in soup.find_all('img'):
    print  link.get('src')

However, you are not writing to the file in that loop, you are just writing link.get('src')+' ' at the very end.

This will only write what link is currently assigned to, which is simply the last img tag that you found in your loop above. That is why only one 'src' value will be written to the output file.

You need to be writing each line to the file within your loop that goes through each of the img tags that you are interested in. You'll need to do a little bit of rearranging to do that:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("youtube.htm"))


file = open("parseddata.txt", "wb")

for link in soup.find_all('img'):
    print  link.get('src')
    file.write(link.get('src')+"
")

file.flush()
file.close()

You should also remember to close the file as I have added in the last line of the above snippet.

Edit: As per Hooked's comment below, here is what this snippet would look like if you use the with keyword. Using with will close the file automatically for you as soon as the indented block ends, so that you don't even have to think about it:

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("youtube.htm"))


with open("parseddata.txt", "wb") as file:
    for link in soup.find_all('img'):
        print  link.get('src')
        file.write(link.get('src')+"
")

BeautifulSoup, parsing and writing the data in a text file

Answers (1)

Related Questions