Reputation: 818
I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file.
csvfile = open(file_path, "rb")
rows = csvfile.readlines()
text_file = open("output.txt", "w")
row_num = 0
while row_num < 20:
text_file.write(", ".join(row[row_num]))
row_num += 1
text_file.close()
I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty.
A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?
Upvotes: 0
Views: 1361
Reputation: 3452
A simple solution would be to just do :
#!/usr/bin/python
# -*- encoding: utf-8 -*-
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
for i, row in enumerate(csvfile):
textfile.write(row)
if i >= 20:
break
Explanation :
with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:
Instead of using open
and close
, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation.
'rb'
and 'wb'
are the keywords you need to open a file in respectively 'reading'
and 'writing'
in 'binary mode'
for i, row in enumerate(csvfile):
This line allows you to read line by line your CSV file, and using a tuple (i, row)
gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it.
Hope this helps !
EDIT : Note that Python has a CSV package that can do that without enumerate
:
# -*- encoding: utf-8 -*-
import csv
file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
reader = csv.reader(csvfile)
with open('output.txt', 'wb') as textfile:
writer = csv.writer(textfile)
i = 0
while i<20:
row = next(reader)
writer.writerow(row)
i += 1
All we need to use is its reader
and writer
. They have functions next
(that reads one line) and writerow
(that writes one). Note that here, the variable row
is not a string, but a list of strings, because the function does the split
job by itself. It might be faster than the previous solution.
Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i
)
Upvotes: 1
Reputation: 599490
There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines()
at the start seems pointless if you're only using 20 lines.
Instead you could use a for loop with enumerate and break when necessary.
csvfile = open(file_path, "rb")
text_file = open("output.txt", "w")
for i, row in enumerate(csvfile):
text_file.write(row)
if row_num >= 20:
break
text_file.close()
You could further improve this by using with
blocks to open the files, rather than closing them explicitly. For example:
with open(file_path, "rb") as csvfile:
#your code here involving csvfile
#now the csvfile is closed!
Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt
.
Upvotes: 2