Get file path of folder and save in CSV file in python

Question

I am using python 2-7 to get file path and save in a CSV(Comma-separated values) file. It contains the path and all file in folder. With file in different folder, it is separated by comma and then the number of its folder without character. For example, I have image 1.pgm that stores in folder s14. Then the CSV format looks like

at/s14/1.pgm;14

In which, at/s14/1.pgm is full path to file 1.pgm and 14 is number (called label) that get from its folder s14. Could you help me implement it by python. I tried with that code but it does not work for my task

#!/usr/bin/env python

import sys
import os.path
#  |-- s14
#  |   |-- 1.pgm
#  |   |-- ...
#  |   |-- 10.pgm
#  |-- s20
#  |   |-- 1.pgm
#  |   |-- ...
#  |   |-- 10.pgm
#  ...
#  |-- s40
#  |   |-- 1.pgm
#  |   |-- ...
#  |   |-- 10.pgm
#

if __name__ == "__main__":

    if len(sys.argv) != 2:
        print "usage: create_csv "
        sys.exit(1)

    BASE_PATH=sys.argv[1]
    SEPARATOR=";"

    label = 0
    for dirname, dirnames, filenames in os.walk(BASE_PATH):
        for subdirname in dirnames:
            subject_path = os.path.join(dirname, subdirname)
            for filename in os.listdir(subject_path):
                abs_path = "%s/%s" % (subject_path, filename)
                print "%s%s%d" % (abs_path, SEPARATOR, label)
            label = label + 1

The expected output is

at/s14/1.pgm;14
at/s14/2.pgm;14
....
at/s14/10.pgm;14
at/s20/1.pgm;20
at/s20/2.pgm;20
....
at/s20/10.pgm;20
....
at/s40/1.pgm;40
at/s40/2.pgm;40
....
at/s40/10.pgm;40

In which at is root folder path that contains subfolder s14,s20... In window, it look like "E:\at"

devonlazarus · Accepted Answer

You are very close to the solution. You should look up the Python regular expression classes here:

https://docs.python.org/2/library/re.html

or -

https://docs.python.org/3.4/library/re.html

I've really only added one line to your app (and removed a couple):

#!/usr/bin/env python

import os.path
import re
import sys

if __name__ == "__main__":

    if len(sys.argv) != 2:
        print "usage: create_csv "
        sys.exit(1)

    BASE_PATH=sys.argv[1]
    SEPARATOR=";"

    for dirname, dirnames, filenames in os.walk(BASE_PATH):
        for subdirname in dirnames:
            subject_path = os.path.join(dirname, subdirname)
            for filename in os.listdir(subject_path):
                abs_path = "%s/%s" % (subject_path, filename)
                label = re.search('s([0-9]+)', subject_path)
                print "%s%s%s" % (abs_path, SEPARATOR, label.group(1))

This code will output:

$ ./create_csv.py . > output.csv
./s14/1.pgm;14
./s14/2.pgm;14
./s14/3.pgm;14
./s20/1.pgm;20
./s20/2.pgm;20
./s20/3.pgm;20
./s40/1.pgm;40
./s40/2.pgm;40
./s40/3.pgm;40

Quick note: this really isn't a CSV. You would need to replace ';' with ','.

Also, as an exercise, since you're running this in a loop and you have no idea how many directories and files you have, you should compile the regular expression first for speed. Hints: re.compile().

Also, the regular expression is a bit loose. You could tighten that up with anchors of '/' on each side to make sure you're only capturing what you want. For help in defining that regular expression, check out https://www.regex101.com/

Get file path of folder and save in CSV file in python

Answers (1)

Related Questions