Anshul Vyas
Anshul Vyas

Reputation: 317

TypeError: coercing to unicode need string or buffer, list found

I am working on a code where I need to convert a dataset of sentences in a txt file to an csv file. here is my code which works fine, converting the input txt file to the format of a csv file.

However, i am not able to make an output csv file. I am new to python programming and so I don't know my way around it as of yet.

Here's my code:

def txtTOcsv():

output_csv = []

with open("dataset.txt", "r") as myfile:
    lines = myfile.readlines()
    for line in lines:
        row = line.split()
        for i in row[1:]:
            tokens  = (row[0],i)
            print tokens
            output_csv.append(tokens)

with open(output_csv,'w') as out_file:
    csv.writer(out_file)

It works fine till

print tokens

and prints all the columns with commas in between just as I want. but when it goes to the line where the output is to be saved in a csv file. it gives this error:

with open(output_csv,'w') as out_file:
TypeError: coercing to Unicode: need string or buffer, list found

Any help would be greatly appreciated. thanks.

Upvotes: 0

Views: 14413

Answers (2)

7stud
7stud

Reputation: 48599

Besides the problem Tzach identified, there are a couple of other problems:

  1. There is no reason to read all the lines of the file into a list.

  2. There is no need to create another list to hold all your processed lines.

If you process a file that happens to be 5GB in size, then your code will copy that data twice into memory, which would require 10GB of memory. That would probably overwhelm your system's memory.

What you can do is:

  1. Read in one line.
  2. Process the line.
  3. Write the processed line to the csv file.
  4. Read in the next line.

That way, you are only reading a very small amount of text into memory at one time. Here is how you can process a file of any size:

import csv

with open("data.txt", newline='') as infile:
    with open('csv3.csv', 'w', newline='') as outfile:
        writer = csv.writer(outfile)

        for line in infile:
            first_word, *words = line.split()

            for word in words:
                 writer.writerow([first_word, word])

This line is a little tricky:

first_word, *words = line.split()

If you do this:

x, y = ["hello", "world"]

python will assign "hello" to x and "world" to y. In other words, python takes the first element on the right, and assigns it to the first variable on the left, then python takes the second element on the right, and assigns it to the second variable on the left, etc.

Next, line.split() returns a list, producing something like this:

first_word, *words = ["The", "apple", "is", "red"]

Once again, python assigns the first element on the right, to the first variable on the left, so "The" gets assigned to first_word. Next, the * tells python to gather the rest of the elements on the right and assign them all to the variable words, which makes words a list.

Upvotes: 1

Tzach
Tzach

Reputation: 13376

output_csv is a list, and open() expects a file name.

Try

with open("output.csv",'w') as out_file:
  csv.writer(out_file).writerows(output_csv)

Upvotes: 1

Related Questions