Reputation: 317
I am working on a code where I need to convert a dataset of sentences in a txt file to an csv file. here is my code which works fine, converting the input txt file to the format of a csv file.
However, i am not able to make an output csv file. I am new to python programming and so I don't know my way around it as of yet.
Here's my code:
def txtTOcsv():
output_csv = []
with open("dataset.txt", "r") as myfile:
lines = myfile.readlines()
for line in lines:
row = line.split()
for i in row[1:]:
tokens = (row[0],i)
print tokens
output_csv.append(tokens)
with open(output_csv,'w') as out_file:
csv.writer(out_file)
It works fine till
print tokens
and prints all the columns with commas in between just as I want. but when it goes to the line where the output is to be saved in a csv file. it gives this error:
with open(output_csv,'w') as out_file:
TypeError: coercing to Unicode: need string or buffer, list found
Any help would be greatly appreciated. thanks.
Upvotes: 0
Views: 14413
Reputation: 48599
Besides the problem Tzach identified, there are a couple of other problems:
There is no reason to read all the lines of the file into a list.
There is no need to create another list to hold all your processed lines.
If you process a file that happens to be 5GB in size, then your code will copy that data twice into memory, which would require 10GB of memory. That would probably overwhelm your system's memory.
What you can do is:
That way, you are only reading a very small amount of text into memory at one time. Here is how you can process a file of any size:
import csv
with open("data.txt", newline='') as infile:
with open('csv3.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
for line in infile:
first_word, *words = line.split()
for word in words:
writer.writerow([first_word, word])
This line is a little tricky:
first_word, *words = line.split()
If you do this:
x, y = ["hello", "world"]
python will assign "hello" to x and "world" to y. In other words, python takes the first element on the right, and assigns it to the first variable on the left, then python takes the second element on the right, and assigns it to the second variable on the left, etc.
Next, line.split() returns a list, producing something like this:
first_word, *words = ["The", "apple", "is", "red"]
Once again, python assigns the first element on the right, to the first variable on the left, so "The" gets assigned to first_word. Next, the *
tells python to gather the rest of the elements on the right and assign them all to the variable words, which makes words a list.
Upvotes: 1
Reputation: 13376
output_csv
is a list, and open()
expects a file name.
Try
with open("output.csv",'w') as out_file:
csv.writer(out_file).writerows(output_csv)
Upvotes: 1