Reputation: 13
I'm trying to loop through a text file with a list of urls and have my python script parse each of the urls in the file.
The code only processes the LAST line in the file, when it should process every line and append the results to the file.
I have no idea what to do, i appreciate your help. Thanks!
import feedparser # pip install feedparser
from BeautifulSoup import BeautifulStoneSoup
from BeautifulSoup import BeautifulSoup
import re
urls = open("c:/a2.txt", "r") # file with rss urls
for lines in urls:
d = feedparser.parse(lines) # feedparser is supposed to process every url in the file(urls)
statusupdate = d.entries[0].description
soup = BeautifulStoneSoup(statusupdate)
for e in d.entries:
print(e.title)
print(e.link)
print(soup.find("img")["src"])
print("\n") # 2 newlines
# writes title,link,image to a file and adds some characters
f = open(r'c:\a.txt', 'a')
f.writelines('"')
f.writelines(e.title)
f.writelines('"')
f.writelines(",")
f.writelines('"')
f.writelines(e.link)
f.writelines('"')
f.writelines(",")
f.writelines('"')
f.writelines(soup.find("img")["src"])
f.writelines('"')
f.writelines(",")
f.writelines("\n")
f.close()
Upvotes: 1
Views: 3175
Reputation: 122326
for lines in urls:
d = feedparser.parse(lines)
This loop simply keeps going and it keeps reassigning something to the variable d
. That means, when the loop is finished, d
will have the values associated with the last line.
If you wish to process every line, you need to do something with every value of d
. For example, you could put every d.entries[0].description
in a list and then iterate over that list to process it.
urls = open("c:/a2.txt", "r") # file with rss urls
results = []
for lines in urls:
results.append(feedparser.parse(lines))
contents = []
for r in results:
statusupdate = r.entries[0].description
soup = BeautifulStoneSoup(statusupdate)
for e in r.entries:
contents.append((e.title, e.link, soup.find("img")["src"]))
with open(r'c:\a.txt', 'a') as f:
for c in contents:
f.writelines('"')
f.writelines(c[0])
f.writelines('"')
f.writelines(",")
f.writelines('"')
f.writelines(c[1])
f.writelines('"')
f.writelines(",")
f.writelines('"')
f.writelines(c[2])
f.writelines('"')
f.writelines(",")
f.writelines("\n")
Upvotes: 1
Reputation: 63707
There are couple of issues in your program
My suggestion would be to keep the open statement for the output file outside the loop, and all your statements should be indented in a way so that its part of the loop which iterates the input file.
Upvotes: 0
Reputation: 65781
Maybe you shouldn't assign the value returned by feedparser.parse()
to the same variable every time?
At least with your current indentation, it's the only thing that happens inside the loop.
statusupdate = d.entries[0].description
only runs once and operates on the last value of d
, because it's outside the loop.
Upvotes: 0