Reputation: 3854
I am using python 2-7 to get file path and save in a CSV
(Comma-separated values) file. It contains the path and all file in folder. With file in different folder, it is separated by comma and then the number of its folder without character. For example, I have image 1.pgm
that stores in folder s14
. Then the CSV format looks like
at/s14/1.pgm;14
In which, at/s14/1.pgm
is full path to file 1.pgm
and 14
is number (called label) that get from its folder s14
. Could you help me implement it by python. I tried with that code but it does not work for my task
#!/usr/bin/env python
import sys
import os.path
# |-- s14
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
# |-- s20
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
# ...
# |-- s40
# | |-- 1.pgm
# | |-- ...
# | |-- 10.pgm
#
if __name__ == "__main__":
if len(sys.argv) != 2:
print "usage: create_csv <base_path>"
sys.exit(1)
BASE_PATH=sys.argv[1]
SEPARATOR=";"
label = 0
for dirname, dirnames, filenames in os.walk(BASE_PATH):
for subdirname in dirnames:
subject_path = os.path.join(dirname, subdirname)
for filename in os.listdir(subject_path):
abs_path = "%s/%s" % (subject_path, filename)
print "%s%s%d" % (abs_path, SEPARATOR, label)
label = label + 1
The expected output is
at/s14/1.pgm;14
at/s14/2.pgm;14
....
at/s14/10.pgm;14
at/s20/1.pgm;20
at/s20/2.pgm;20
....
at/s20/10.pgm;20
....
at/s40/1.pgm;40
at/s40/2.pgm;40
....
at/s40/10.pgm;40
In which at
is root folder path that contains subfolder s14,s20...
In window, it look like "E:\at"
Upvotes: 0
Views: 2014
Reputation: 1307
You are very close to the solution. You should look up the Python regular expression classes here:
https://docs.python.org/2/library/re.html
https://docs.python.org/3.4/library/re.html
I've really only added one line to your app (and removed a couple):
#!/usr/bin/env python
import os.path
import re
import sys
if __name__ == "__main__":
if len(sys.argv) != 2:
print "usage: create_csv <base_path>"
sys.exit(1)
BASE_PATH=sys.argv[1]
SEPARATOR=";"
for dirname, dirnames, filenames in os.walk(BASE_PATH):
for subdirname in dirnames:
subject_path = os.path.join(dirname, subdirname)
for filename in os.listdir(subject_path):
abs_path = "%s/%s" % (subject_path, filename)
label = re.search('s([0-9]+)', subject_path)
print "%s%s%s" % (abs_path, SEPARATOR, label.group(1))
This code will output:
$ ./create_csv.py . > output.csv
./s14/1.pgm;14
./s14/2.pgm;14
./s14/3.pgm;14
./s20/1.pgm;20
./s20/2.pgm;20
./s20/3.pgm;20
./s40/1.pgm;40
./s40/2.pgm;40
./s40/3.pgm;40
Quick note: this really isn't a CSV. You would need to replace ';' with ','.
Also, as an exercise, since you're running this in a loop and you have no idea how many directories and files you have, you should compile the regular expression first for speed. Hints: re.compile().
Also, the regular expression is a bit loose. You could tighten that up with anchors of '/' on each side to make sure you're only capturing what you want. For help in defining that regular expression, check out https://www.regex101.com/
Upvotes: 1