user3096578
user3096578

Reputation: 11

Writing string list to a txt file, but the file is empty why?

import re, urllib.request

patern = re.compile(r'image/\w*\W*\w*\.\jpg', re.I|re.M)

file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()


def lic(li):
    if not li:
        pass
    else:
        print(li[0])
        f.write('http://apod.nasa.gov/apod/%s\n' % li[0])


def main():
    for i in range(len(a)):
        ur = urllib.request.urlopen(a[i])
        mf = re.findall(patern, str(ur.read()))
        lic(mf)

f = open('APODImgs.txt','w')
main()
f.close()

What's wrong with my code i'm try to write a txt file with all the jpg pictures from Astronomy picture of the day but the file APODImgs.txt is empty... The mf list some times is empty maybe this is my problem...

The APODLinks.txt contain urls like this:

apod.nasa.gov/apod/ap140815.html
apod.nasa.gov/apod/ap140814.html
apod.nasa.gov/apod/ap140813.html

7000 lines of urls

The APODImgs.txt must be like this:

apod.nasa.gov/apod/image/1408/Persei93_1abolfath.jpg
apod.nasa.gov/apod/image/1408/Supermoon_20140810.JPG
apod.nasa.gov/apod/image/1408/m57_nasagendler_3000.jpg
apod.nasa.gov/apod/image/1408/HebesChasma_esa_1024.jpg
...

Please help and sorry for my English...

Upvotes: 1

Views: 72

Answers (2)

user3096578
user3096578

Reputation: 11

I change my code and it works!!!

import re, urllib.request

patern = re.compile(r'image/\w*\W*\w*\.jpg', re.I|re.M)

file = open('APODLinks.txt','r')
rf = file.read()
a = rf.split('\n')
file.close()


def lic(li):
    if not li:
        print("No matches found")  
    else:
        print('http://apod.nasa.gov/apod/%s' % li[0])
        f.write('http://apod.nasa.gov/apod/%s\n' % li[0])


def main():
    for i in range(len(a)):
        try:
            ur = urllib.request.urlopen(a[i])
        except:
            print('404 not found!')
        mf = re.findall(patern, str(ur.read()))
        lic(mf)

f = open('APODImgs.txt','w')
main()
f.close()

Upvotes: 0

pts
pts

Reputation: 87341

Most probably not li is always true in lic, because your regexp doesn't match.

To figure it out, print the HTTP response body:

urr = urllib.request.urlopen(a[i]).read()
print repr(urr)
mf = re.findall(patern, urr)
print repr(mf)
lic(mf)

Upvotes: 1

Related Questions