Tom
Tom

Reputation: 7091

Python looping to read and parse all in a directory

class __init__:
    path = "articles/"
    files = os.listdir(path)
    files.reverse()

    def iterate(Files, Path):

        def handleXml(content):

            months = ['', 'January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December']

            parse = re.compile('<(.*?)>(.*?)<(.*?)>').findall(content)
            day = parse[1][1]
            month = months[int(parse[2][1])]
            dayN = parse[3][1]
            year = parse[4][1]
            hour = parse[5][1]
            min = parse[6][1]
            amPM = parse[7][1]
            title = parse[9][1]
            author = parse[10][1]
            article = parse[11][1]
            category = parse[12][1]

        if len(Files) > 5:
            del Files[5:]

        for file in Files:
            file = "%s%s" % (Path, file)
            f = open(file, 'r')
            handleXml(f.read())
            f.close()

    iterate(files, path)

It runs on start, and if I check the files array it contains all the file names. But when I loop through them they just do not work, only displays the first one. If I return file I only get the first two, and if I return parse even on duplicate files it is not identical. None of this makes any sense.

I am trying to make a simple blog using Python, and because my server has a very old version of Python I cannot use modules like glob, everything needs to be as basic as possible.

The files array contains all the files in the directory, which is good enough for me. I do not need to go through other directories inside the articles directory.

But when I try to output parse, even on duplicate files I get different results.

Thanks,

Upvotes: 1

Views: 3011

Answers (2)

rob
rob

Reputation: 37684

As stated in the comments, the actual recursion is missing.
Even if it is there in some other place of the code, the recursion call is the typical place where the things are wrong, and for this reason I would suggest you to double check it.

However, why don't you use os.walk? It iterates through all the path, without the need of reinventing the (recursive) wheel. It has been introduced in 2.3, though, and I do not know how old your python is.

Upvotes: 0

orip
orip

Reputation: 75547

Could it be because of:

del Files[5:]

It deletes the last 5 entries from the original list as well. Instead of using del, you can try:

for file in Files[:5]:
  #...

Upvotes: 1

Related Questions