Reputation: 115
I want to convert multiple text files into a single csv file.the text are named as(file1.txt,file2.txt....file1000.txt). A text file(file1.txt) format is as follows:
Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25
I want the ouput as:
Employee id,Employee name,department,Age
us51243,Mark santosh,engineering,25//(file1.txt values)
...................................//(file2.txt values)
But in the ouput I am getting the value of file1000.txt only as follows:
Employee id,Employee name,department,Age
us98621,Andy Gonzalez,Support & services,25
Here is my code:
import csv
import os
for x in range(1,1001):
filepath=os.path.normpath('C:\\Text\\file{}.txt'.format(x))
with open(filepath) as f, open('emp.csv', 'w',newline='') as file:
writer = csv.writer(file)
val = zip(*[l.rstrip().split(': ') for l in f])
writer.writerows(val)
Kindly note:Also I want to display the header(Employee id,Employee name,Department,Age) only once
Upvotes: 2
Views: 4249
Reputation: 18906
First let's create two files:
s1 = u"""Employee id: us51243
Employee name: Mark santosh
department:engineering
Age:25"""
s2 = u"""Employee id: us51244
Employee name: Any santosh
department:engineering
Age:24"""
with open("file1.txt", "w") as f:
f.write(s1)
with open("file2.txt", "w") as f:
f.write(s2)
Now let's use pandas:
import pandas as pd
# Filelist
filelist = ["file1.txt","file2.txt"]
# Create dataframe
df = pd.DataFrame(columns=["Employee id","Employee name","department","Age","file"])
# Loop through files
for ind,file in enumerate(filelist):
data = pd.read_csv(file, header=None, sep=":").iloc[:,1]
df.loc[ind] = data.tolist() + [file]
df
Output:
Employee id Employee name department Age file
0 us51243 Mark santosh engineering 25 file1.txt
1 us51243 Mark santosh engineering 25 file2.txt
Upvotes: 0
Reputation: 46759
You are currently reopening your file for each new text file which is causing all the contents to be overwritten. Also, you could use the CSV library to read your text files too by specifying the delimiter as :
and skipping any extra spaces:
import csv
import os
header = ["Employee id", "Employee name", "department", "Age"]
with open('emp.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(header)
for x in range(1, 1001):
filepath = os.path.normpath(r'C:\Text\file{}.txt'.format(x))
with open(filepath, 'r', newline='') as f_text:
csv_text = csv.reader(f_text, delimiter=':', skipinitialspace=True)
csv_output.writerow(row[1] for row in csv_text)
Upvotes: 2
Reputation: 5286
Try the following:
import csv
import os
FIELDS = ('Employee id', 'Employee name', 'department', 'Age')
def read_file(file, keys):
output = dict.fromkeys(keys)
for line in file:
line = line.rstrip().split(': ')
output[line[0]] = line[1]
return output
with open('emp.csv', 'w', newline='') as destiny:
writer = csv.DictWriter(destiny, FIELDS)
writer.writeheader()
for x in range(1, 1001):
with open(os.path.normpath('C:\\test\\file{}.txt'.format(x))) as origin:
writer.writerow(read_file(file, FIELDS))
Upvotes: 0