Reputation: 23
My goal is to generate an CSV file which lists the project name and documents associated with it. The project name will be the folder name (e.x. Project1, Project2) the documents will be the files located in the folder.
C:\SHH\Testenv
C:\SHH\Testenv\Project1
C:\SHH\Testenv\Project2
C:\SHH\Testenv\Project1\test.txt
C:\SHH\Testenv\Project1\test.ppt
C:\SHH\Testenv\Project2\payroll.ppt
import os
import xlwt
import csv
from os import walk
path = 'C:\SHH\Testenv'
folders = [] # list that will contain folder names (basicaly the project names)
pathf = [] # list that will contain the directory of each folder
files = [] # list of files in a folder (basically documents for each project)
for item in os.listdir(path):
if not os.path.isfile(os.path.join(path, item)):
folders.append(os.path.join(item))
pathf.append(os.path.join(path,item))
for x in pathf :
for (dirpath, dirnames, filenames) in walk(x):
files.extend(filenames)
print files
I am stuck at associating each file to its respective folder and then printing this to a CSV file
Thank you in advance
Upvotes: 2
Views: 3383
Reputation: 168616
os.walk
and csv.writer
are you friends in this task:
import os
import csv
path = '/tmp/SSH/Testenv'
with open('/tmp/output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Project Name', 'Documents'])
for dirpath, _, filenames in os.walk(path):
if filenames:
writer.writerow([os.path.basename(dirpath)] + filenames)
Or, if you prefer generator expressions:
with open('/tmp/output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Project Name', 'Documents'])
writer.writerows(
[os.path.basename(dirpath)]+filenames
for dirpath,_,filenames in os.walk(path)
if filenames)
Result:
Project Name,Documents
Project2,payroll.ppt
Project1,test.ppt,test.txt
with open('/tmp/output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile)
writer.writerow(['Project Name', 'Documents'])
for dirpath, dirs, filenames in os.walk(path, topdown=True):
dirs.sort()
if filenames:
writer.writerow([os.path.basename(dirpath)] + sorted(filenames))
Result:
Project Name,Documents
Project1,test.ppt,test.txt
Project2,payroll.ppt
Upvotes: 3
Reputation: 6420
It is probably easier to handle a project/directory completely before going on to the next one. In addition, it seems like a dictionary is the ideal structure.
import os
path = 'C:\SHH\Testenv'
projects = {}
for item in os.listdir(path):
current = os.path.join(path, item)
if os.path.isdir(current):
projects[item] = []
for f in os.listdir(current):
if os.path.isfile(os.path.join(current, f)):
projects[item].append(f)
f = open('projects.csv', 'w')
f.write('Project Name____Documents\n')
for p in projects:
f.write(p + '____' + '____'.join(projects[p]) + '\n')
f.close()
The first step is to get the root directories, meaning the projects (os.path.isdir()
). We create an entry in the dict for the project, holding an empty list. Next, we list all the files in this project directory and add them to the list.
Since you don't really have a typical csv structure, I just used the normal file I/O. The project names and documents are seperated by four underscores each, but you can easily adjust that.
Upvotes: 1
Reputation: 29580
try
from os import walk, listdir
from os.path import join, isfile
path = 'C:\SHH\Testenv'
# use walk
for (dirpath, dirnames, filenames) in walk(path):
# at every directory, check if there is at least one file
# i.e. check that it is neither empty nor full of other directories
files_found = False
for dir_f in os.listdir(dirpath):
if isfile(join(dirpath,dir_f)):
files_found = True
break
# if we found at least one file, output csv-style format
if files_found:
print dirpath + "," + ",".join([f for f in os.listdir(dirpath) if isfile(join(dirpath,f))])
also notice the difference between os.path.join()
, which joins paths, and str.join()
, used here as ",".join(...)
, which joins a sequence of strings with a delimiter, which is a comma (,
) in this case.
Upvotes: 0