3kstc
3kstc

Reputation: 1965

How to sequentially read all the files in a directory and export the contents in Python?

I have a directory /directory/some_directory/ and in that directory I have a set of files. Those files are named in the following format: <letter>-<number>_<date>-<time>_<dataidentifier>.log, for example:

ABC1-123_20162005-171738_somestring.log
DE-456_20162005-171738_somestring.log
ABC1-123_20162005-153416_somestring.log
FG-1098_20162005-171738_somestring.log
ABC1-123_20162005-031738_somestring.log
DE-456_20162005-171738_somestring.log

I would like to read those a subset of those files (for example, read only files named as ABC1-123*.log) and export all their contents to a single csv file (for example, output.csv), that is, a CSV file that will have all the data from the inidividual files collectively.

The code that I have written so far:

#!/usr/bin/env python

import os
file_directory=os.getcwd()

m_class="ABC1"
m_id="123"
device=m_class+"-"+m_id


for data_file in sorted(os.listdir(file_dir)):
    if str(device)+"*" in os.listdir(file_dir):
        print data_file

I don't know how to read a only a subset of filtered files and also how to export them to a common csv file.

How can I achieve this?

Upvotes: 1

Views: 2024

Answers (4)

Moon Cheesez
Moon Cheesez

Reputation: 2701

You can use the os module to list the files.

import os

files = os.listdir(os.getcwd())

m_class = "ABC1"
m_id = "123"
device = m_class + "-" + m_id
file_extension = ".log"

# filter the files by their extension and the starting name
files = [x for x in files if x.startswith(device) and x.endswith(file_extension)]

f = open("output.csv", "a")
for file in files:
    with open(file, "r") as data_file:
        f.write(data_file.read())
    f.write(",\n")

f.close()

Upvotes: 1

jvdm
jvdm

Reputation: 876

Your question could be better stated, based on your current code snipet, I'll assume that you want to:

  1. Filter files in a directory based on glob pattern.
  2. Concatenate their contents to a file named output.csv.

In python you can achieve (1.) by using glob to list filenames.

import glob

for filename in glob.glob('foo*bar'):
    print filename

That would print all files starting with foo and ending with bar in the current directory.

For (2.) you just read the file and write its content to your desired output, using python's open() builtin function:

open('filename', 'r')

(Using 'r' as the mode you are asking python to open the file for "reading", using 'w' you are asking python to open the file for "writing".)

The final code would look like the following:

import glob
import sys

device = 'ABC1-123'
with open('output.csv', 'w') as output:
    for filename in glob.glob(device+'*'):
        with open(filename, 'r') as input:
            output.write(input.read())

Upvotes: 1

Merlin
Merlin

Reputation: 25659

Only a few adjustments, You were close

filesFromDir = os.listdir(os.getcwd())    

fileList = [file for file in filesFromDir if file.startswith(device)]

f = open("LogOutput.csv", "ab")
for file in fileList:
    #print "Processing", file
    with open(file, "rb") as log_file:
        txt = log_file.read()
        f.write(txt)
    f.write("\n")

f.close()

Upvotes: 1

Terran
Terran

Reputation: 699

just use re lib to match file name pattern, and use csv lib to export.

Upvotes: 1

Related Questions