sultan
sultan

Reputation: 115

Converting 1000 text files into a single csv file

I want to convert multiple text files into a single csv file.the text are named as(file1.txt,file2.txt....file1000.txt). A text file(file1.txt) format is as follows:

Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25 

I want the ouput as:

Employee id,Employee name,department,Age
us51243,Mark santosh,engineering,25//(file1.txt values)
...................................//(file2.txt values)

But in the ouput I am getting the value of file1000.txt only as follows:

Employee id,Employee name,department,Age
us98621,Andy Gonzalez,Support & services,25

Here is my code:

import csv
import os
for x in range(1,1001):
    filepath=os.path.normpath('C:\\Text\\file{}.txt'.format(x))
    with open(filepath) as f, open('emp.csv', 'w',newline='') as file:
        writer = csv.writer(file)
        val = zip(*[l.rstrip().split(': ') for l in f])
        writer.writerows(val)

Kindly note:Also I want to display the header(Employee id,Employee name,Department,Age) only once

Upvotes: 2

Views: 4249

Answers (3)

Anton vBR
Anton vBR

Reputation: 18906

First let's create two files:

s1 = u"""Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25"""

s2 = u"""Employee id: us51244
Employee name: Any santosh
department:engineering
Age:24"""

with open("file1.txt", "w") as f:
    f.write(s1)

with open("file2.txt", "w") as f:
    f.write(s2)

Now let's use pandas:

import pandas as pd

# Filelist
filelist = ["file1.txt","file2.txt"]

# Create dataframe
df = pd.DataFrame(columns=["Employee id","Employee name","department","Age","file"])


# Loop through files
for ind,file in enumerate(filelist):
    data = pd.read_csv(file, header=None, sep=":").iloc[:,1] 
    df.loc[ind] = data.tolist() + [file]

df

Output:

  Employee id Employee name   department Age       file
0     us51243  Mark santosh  engineering  25  file1.txt
1     us51243  Mark santosh  engineering  25  file2.txt

Upvotes: 0

Martin Evans
Martin Evans

Reputation: 46759

You are currently reopening your file for each new text file which is causing all the contents to be overwritten. Also, you could use the CSV library to read your text files too by specifying the delimiter as : and skipping any extra spaces:

import csv
import os

header = ["Employee id", "Employee name", "department", "Age"]

with open('emp.csv', 'w', newline='') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(header)

    for x in range(1, 1001):
        filepath = os.path.normpath(r'C:\Text\file{}.txt'.format(x))

        with open(filepath, 'r', newline='') as f_text:
            csv_text = csv.reader(f_text, delimiter=':', skipinitialspace=True)
            csv_output.writerow(row[1] for row in csv_text)

Upvotes: 2

Adirio
Adirio

Reputation: 5286

Try the following:

import csv
import os

FIELDS = ('Employee id', 'Employee name', 'department', 'Age')

def read_file(file, keys):
    output = dict.fromkeys(keys)
    for line in file:
        line = line.rstrip().split(': ')
        output[line[0]] = line[1]
    return output

with open('emp.csv', 'w', newline='') as destiny:
    writer = csv.DictWriter(destiny, FIELDS)
    writer.writeheader()
    for x in range(1, 1001):
        with open(os.path.normpath('C:\\test\\file{}.txt'.format(x))) as origin:
            writer.writerow(read_file(file, FIELDS))

Upvotes: 0

Related Questions