ubuntunoob
ubuntunoob

Reputation: 249

Python - Writing to CSV file spits all text into one cell

I'm new to Python, and I'm trying to take a few text files which have data in the following format :

time_stamp duration word1 p1 word2 p2 ....

and write the data into a CSV file. With each row that I write in, I also want to include the name of the file from which that row is taken.

My problem is that the code I'm executing seems to write all the text into one single cell. When I open the CSV file with LibreOffice, it displays junk characters saying that it cannot display the data correctly because the maximum limit of characters per cell is exceeded.

What am I doing wrong?

import csv, os

path = "./TRIAL with CSV"
all_the_files = os.listdir(path)

for each_file in all_the_files:
    file = os.path.join(path, each_file)
    ifile  = open(file, "rb")
    reader = csv.reader(ifile)
    ofile  = open('CSVtrial.csv', "a")
    writer = csv.writer(ofile, delimiter='\t', quoting=csv.QUOTE_MINIMAL)

    for row in reader:
        writer.writerow([each_file,row])


ifile.close()
ofile.close()

EDIT : I don't think I can attach a file here, but a sample input file would be a .txt file which has the following lines :

186.860 0.060 (01) 0.89221220 ha(01) 0.04941113 ah(01) 0.04552169 oo(01) 0.01065865 o(01) 0.00219633

187.110 0.410 bongga(01) 1.00000000

187.520 0.349 naman(01) 0.99999601

187.872 0.598 niyan(01) 0.81980968 'yan(02) 0.16510634 iyan(02) 0.00814381 niya(01) 0.00471968 ganyan(01) 0.00139249 diyan(03) 0.00082159

There are some 50,000 such .txt files, which I'm trying to compile into a single CSV file to make it easier to analyse the data.

Here is a screenshot of the kind of error I get when I try to view the CSV file in LibreOffice - https://i.sstatic.net/QS0Qa.png

For now I've given up on this and am just trying to work on vim. If anyone has any other suggestions about how I can compile these .txt files into something like a spreadsheet which would make it easier to analyse the data, I would appreciate it.

Upvotes: 1

Views: 3254

Answers (3)

mlnyc
mlnyc

Reputation: 2726

First, not related to your question: Are you sure you want a tab delimiter? Maybe a comma delimited would be better for the output.

Second, it looks like your input files are SPACE delimited, and yet you create the reader with the default COMMA delimited. You might want to try to pass in a delimiter:

reader = csv.reader(ifile, delimiter=' ')

Not sure if that helps, some sample data might be useful so I can muck around with it here.

EDIT: I also noticed that you are trying to concatenate files into a single file? Am I correct in that assumption?

For that, I might use other (i.e. unix) tools that will get that done much faster. If your goal is to just do something fun in python my recommendation might be to declare a single writer outside the loop rather than reopening the same file with append mode.

EDIT2: without knowing much more about your code, I might approach it as follows:

writer = csv.writer(open('CSVTrial.csv', 'w'), delimiter='\t', quoting=csv.QUOTE_MINIMAL)
for each_file in all_the_files:
    file = os.path.join(path, each_file)
    with open(file, "rb") as ifile:
        reader = csv.reader(ifile, delimiter=" ")
        for row in reader:
            writer.writerow([each_file] + row)
ofile.close()

notice of course that the with statement in the reader file is meant to avoid the explicit close() on the file. The writer can have an explicit close or you can use the same treatment. It's also nice that the input files are guaranteed to close even if exception happens. You have to make sure the output file closes as well, using some try...catch...finally

Hope this helps!

Upvotes: 0

bereal
bereal

Reputation: 34312

Assuming that the delimiters are correct, this looks wrong:

for row in reader:
    writer.writerow([each_file,row])

because that will create a row with exactly two columns, where the second cell will contain the whole original row, serialized.

Should be rather:

for row in reader:
    writer.writerow([each_file] + row)

Upvotes: 3

Alvaro
Alvaro

Reputation: 12037

Try opening the file with a simple text editor.. probably the problem isn't in your code. Some spreadsheet apps use "," as a separator. Others use ";" or even " " or tab. If the file looks like what it should, try using a different separator

Upvotes: 0

Related Questions