1rick
1rick

Reputation: 51

Select files in directory and move them based on text list of filenames

So I have a folder of a few thousand pdf files in /path, and I have a list of hundreds of names called names.csv (only one column, it could just as easily be .txt).

I'm trying to select (and ideally, move) the pdfs, where any name from names.csv is found in any filename.

From my research so far, it seems like listdir and regex is one approach to at least get a list of the files I want:

import os, sys  
import re 


for files in os.listdir('path'):
    with open('names.csv') as names: 
        for name in names:
            match  = re.search(name, files)

        print match  

But currently this is just returning 'None' 'None' etc, all the way down.

I'm probably doing a bunch of things wrong here. And I'm not even near the part where I need to move the files. But I'm just hoping to get over this first hump.

Any advice is much appreciated!

Upvotes: 1

Views: 3032

Answers (2)

Aran-Fey
Aran-Fey

Reputation: 43136

The problem is that your name variable always ends with a newline character \n. The newline character isn't present in the file names, so regex doesn't find any matches.

There are also a few other small issues with your code:

  • You're opening the names.csv file in each iteration of the loop. It would be more efficient to open the file once, then loop through all files in the directory.
  • Regex isn't necessary here, and in fact can cause problems. If, for example, a line in your csv file looked like (this isn't a valid regex, then your code would throw an exception. This could be fixed by escaping it first, but regex still isn't necessary.
  • Your print match is in the wrong place. Since match is overwritten in each iteration of the loop, and you're printing its value after the loop, you only get to see its last value.

The fixed code could look like this:

import os

# open the file, make a list of all filenames, close the file
with open('names.csv') as names_file:
    # use .strip() to remove trailing whitespace and line breaks
    names= [line.strip() for line in names_file] 

for filename in os.listdir('path'):
    for name in names:
        # no need for re.search, just use the "in" operator
        if name in filename:
             # move the file
             os.rename(os.path.join('path', filename), '/path/to/somewhere/else')
             break

Upvotes: 1

Stephen Miller
Stephen Miller

Reputation: 522

You say that your names.csv is one column. That must mean that each name is followed by a newline char, which will also be included when matching. You could try this:

match  = re.search(name.rstrip(), files)

Hope it helps.

Upvotes: 0

Related Questions