Ottoman
Ottoman

Reputation: 23

Python script to generate CSV file with foldernames including their associated files

My goal is to generate an CSV file which lists the project name and documents associated with it. The project name will be the folder name (e.x. Project1, Project2) the documents will be the files located in the folder.

Ideal output of CSV file

Folder Structure

C:\SHH\Testenv

C:\SHH\Testenv\Project1

C:\SHH\Testenv\Project2

C:\SHH\Testenv\Project1\test.txt

C:\SHH\Testenv\Project1\test.ppt

C:\SHH\Testenv\Project2\payroll.ppt

Code I have Tried

import os
import xlwt 
import csv 
from os import walk

path = 'C:\SHH\Testenv'  
folders = [] # list that will contain folder names (basicaly the project names)
pathf = [] # list that will contain the directory of each folder 
files = [] # list of files in a folder (basically documents for each project) 

for item in os.listdir(path):
    if not os.path.isfile(os.path.join(path, item)):
        folders.append(os.path.join(item)) 
    pathf.append(os.path.join(path,item)) 

for x in pathf : 
    for (dirpath, dirnames, filenames) in walk(x):
        files.extend(filenames)
        print files

I am stuck at associating each file to its respective folder and then printing this to a CSV file

Thank you in advance

Upvotes: 2

Views: 3383

Answers (3)

Robᵩ
Robᵩ

Reputation: 168616

os.walk and csv.writer are you friends in this task:

import os
import csv

path = '/tmp/SSH/Testenv'

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  for dirpath, _, filenames in os.walk(path):
    if filenames:
      writer.writerow([os.path.basename(dirpath)] + filenames)

Or, if you prefer generator expressions:

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  writer.writerows(
    [os.path.basename(dirpath)]+filenames
    for dirpath,_,filenames in os.walk(path)
    if filenames)

Result:

Project Name,Documents
Project2,payroll.ppt
Project1,test.ppt,test.txt


EDIT: It bothered me that the output wasn't sorted. Here is a version in which the projects are sorted, and the files are sorted in each project:

with open('/tmp/output.csv', 'wb') as csvfile:
  writer = csv.writer(csvfile)
  writer.writerow(['Project Name', 'Documents'])
  for dirpath, dirs, filenames in os.walk(path, topdown=True):
    dirs.sort()
    if filenames:
      writer.writerow([os.path.basename(dirpath)] + sorted(filenames))

Result:

Project Name,Documents
Project1,test.ppt,test.txt
Project2,payroll.ppt

Upvotes: 3

Tim Zimmermann
Tim Zimmermann

Reputation: 6420

It is probably easier to handle a project/directory completely before going on to the next one. In addition, it seems like a dictionary is the ideal structure.

import os

path = 'C:\SHH\Testenv'
projects = {}

for item in os.listdir(path):
    current = os.path.join(path, item)
    if os.path.isdir(current):
        projects[item] = []
        for f in os.listdir(current):
            if os.path.isfile(os.path.join(current, f)):
                projects[item].append(f)

f = open('projects.csv', 'w')
f.write('Project Name____Documents\n')
for p in projects:
    f.write(p + '____' + '____'.join(projects[p]) + '\n')

f.close()

The first step is to get the root directories, meaning the projects (os.path.isdir()). We create an entry in the dict for the project, holding an empty list. Next, we list all the files in this project directory and add them to the list.
Since you don't really have a typical csv structure, I just used the normal file I/O. The project names and documents are seperated by four underscores each, but you can easily adjust that.

Upvotes: 1

arturomp
arturomp

Reputation: 29580

try

from os import walk, listdir
from os.path import join, isfile

path = 'C:\SHH\Testenv'

# use walk
for (dirpath, dirnames, filenames) in walk(path):                 
    # at every directory, check if there is at least one file
    # i.e. check that it is neither empty nor full of other directories
    files_found = False
    for dir_f in os.listdir(dirpath):
        if isfile(join(dirpath,dir_f)):
            files_found = True
            break

    # if we found at least one file, output csv-style format
    if files_found:
        print dirpath + "," + ",".join([f for f in os.listdir(dirpath) if isfile(join(dirpath,f))])

also notice the difference between os.path.join(), which joins paths, and str.join(), used here as ",".join(...), which joins a sequence of strings with a delimiter, which is a comma (,) in this case.

Upvotes: 0

Related Questions