BBedit
BBedit

Reputation: 8057

How to get regex to return string (not a regex object)?

I've read the regex documentation, but it is very confusing for a beginner programmer like me. So my last resort is to post here.

# Tivo Notifier
import os, re

WATCH_DIR = "D:/tivo"
TO_FIND = [".*big.brother.uk.s15.*", ".*mock.the.week.*", ".*family.guy.*"]

# open history log file
history = open("history.txt", "w+")

# get downloaded files
files = os.listdir(WATCH_DIR)

# compare each file to regex patterns
for pattern in TO_FIND:
    regex =  re.compile(pattern)
    match = [m.group(0) for file in files for m in [regex.search(file)] if m]

    for filename in match:
        if filename not in history.read():      # if a new match is found
            print "new:", filename              # display new match file name
            history.write(filename)             # add file name to history file
history.close()

The problem here is that it writes a ton of garbage to the history file: http://pastebin.com/3C5iVbU7

I'm assuming this is because filename is not a string, and is probably a kind of regex object. I cannot see in the documentation how to return a string.

I would like to add only the file name to the history file, not the garbage text that is actually added from this script.

Could someone tell me how to do this?

Upvotes: 1

Views: 93

Answers (1)

mhawke
mhawke

Reputation: 87134

Here's a more straightforward way that uses glob instead of regular expressions. It also uses sets to maintain history and new files.

import os, glob

WATCH_DIR = 'D:/tivo'
TO_FIND = ['*big.brother.uk.s15*', '*mock.the.week*', '*family.guy*']

history = set(open('history.txt').read().splitlines())

new_files = set()
for pattern in TO_FIND:
        files = glob.glob(os.path.join(WATCH_DIR, pattern))
        # optionally strip directories from file names
        files = [os.path.basename(f) for f in files]
        new_files.update(files)

new_files = new_files.difference(history)
for f in sorted(new_files):
        print "new: %s" % f

history.update(new_files)
open('history.txt', 'w').write('%s\n' % '\n'.join(sorted(history)))

Upvotes: 1

Related Questions