Reputation: 11
import re, urllib.request
patern = re.compile(r'image/\w*\W*\w*\.\jpg', re.I|re.M)
file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()
def lic(li):
if not li:
pass
else:
print(li[0])
f.write('http://apod.nasa.gov/apod/%s\n' % li[0])
def main():
for i in range(len(a)):
ur = urllib.request.urlopen(a[i])
mf = re.findall(patern, str(ur.read()))
lic(mf)
f = open('APODImgs.txt','w')
main()
f.close()
What's wrong with my code i'm try to write a txt file with all the jpg pictures from Astronomy picture of the day but the file APODImgs.txt is empty... The mf list some times is empty maybe this is my problem...
The APODLinks.txt contain urls like this:
apod.nasa.gov/apod/ap140815.html
apod.nasa.gov/apod/ap140814.html
apod.nasa.gov/apod/ap140813.html
7000 lines of urls
The APODImgs.txt must be like this:
apod.nasa.gov/apod/image/1408/Persei93_1abolfath.jpg
apod.nasa.gov/apod/image/1408/Supermoon_20140810.JPG
apod.nasa.gov/apod/image/1408/m57_nasagendler_3000.jpg
apod.nasa.gov/apod/image/1408/HebesChasma_esa_1024.jpg
...
Please help and sorry for my English...
Upvotes: 1
Views: 72
Reputation: 11
I change my code and it works!!!
import re, urllib.request
patern = re.compile(r'image/\w*\W*\w*\.jpg', re.I|re.M)
file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()
def lic(li):
if not li:
print("No matches found")
else:
print('http://apod.nasa.gov/apod/%s' % li[0])
f.write('http://apod.nasa.gov/apod/%s\n' % li[0])
def main():
for i in range(len(a)):
try:
ur = urllib.request.urlopen(a[i])
except:
print('404 not found!')
mf = re.findall(patern, str(ur.read()))
lic(mf)
f = open('APODImgs.txt','w')
main()
f.close()
Upvotes: 0
Reputation: 87341
Most probably not li
is always true in lic
, because your regexp doesn't match.
To figure it out, print the HTTP response body:
urr = urllib.request.urlopen(a[i]).read()
print repr(urr)
mf = re.findall(patern, urr)
print repr(mf)
lic(mf)
Upvotes: 1