Clíodhna
Clíodhna

Reputation: 818

Loop that will iterate a certain number of times through a CSV in Python

I have a large CSV file (~250000 rows) and before I work on fully parsing and sorting it I was trying to display only a part of it by writing it to a text file.

   csvfile = open(file_path, "rb")
   rows = csvfile.readlines()
   text_file = open("output.txt", "w")
   row_num = 0
   while row_num < 20:
       text_file.write(", ".join(row[row_num]))
       row_num += 1
   text_file.close()

I want to iterate through the CSV file and write only a small section of it to a text file so I can look at how it does this and see if it would be of any use to me. Currently the text file ends up empty.

A way I thought might do this would be to iterate through the file with a for loop that exits after a certain number of iteration but I could be wrong and I'm not sure how to do this, any ideas?

Upvotes: 0

Views: 1361

Answers (2)

AdrienW
AdrienW

Reputation: 3452

A simple solution would be to just do :

#!/usr/bin/python
# -*- encoding: utf-8 -*-

file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
    with open('output.txt', 'wb') as textfile:
        for i, row in enumerate(csvfile):
            textfile.write(row)
            if i >= 20:
                break

Explanation :

with open(file_path, 'rb') as csvfile:
with open('output.txt', 'wb') as textfile:

Instead of using open and close, it is recommended to use this line instead. Just write the lines that you want to execute when your file is opened into a new level of indentation.

'rb' and 'wb' are the keywords you need to open a file in respectively 'reading' and 'writing' in 'binary mode'

for i, row in enumerate(csvfile):

This line allows you to read line by line your CSV file, and using a tuple (i, row) gives you both the content of the row and its index. That's one of the awesome built-in functions from Python : check out here for more about it.

Hope this helps !


EDIT : Note that Python has a CSV package that can do that without enumerate :

# -*- encoding: utf-8 -*-

import csv

file_path = './test.csv'
with open(file_path, 'rb') as csvfile:
    reader = csv.reader(csvfile)
    with open('output.txt', 'wb') as textfile:
        writer = csv.writer(textfile)
        i = 0
        while i<20:
            row = next(reader)
            writer.writerow(row)
            i += 1

All we need to use is its reader and writer. They have functions next (that reads one line) and writerow (that writes one). Note that here, the variable row is not a string, but a list of strings, because the function does the split job by itself. It might be faster than the previous solution.

Also, this has the major advantage of allowing you to look anywhere you want in the file, no necessarily from the beginning (just change the bounds for i)

Upvotes: 1

Daniel Roseman
Daniel Roseman

Reputation: 599490

There's nothing specifically wrong with what you're doing, but it's not particularly Pythonic. In particular reading the whole file into memory with readlines() at the start seems pointless if you're only using 20 lines.

Instead you could use a for loop with enumerate and break when necessary.

csvfile = open(file_path, "rb")
text_file = open("output.txt", "w")
for i, row in enumerate(csvfile):
    text_file.write(row)
    if row_num >= 20:
        break
text_file.close()

You could further improve this by using with blocks to open the files, rather than closing them explicitly. For example:

with open(file_path, "rb") as csvfile:
    #your code here involving csvfile
#now the csvfile is closed!

Also note that Python might not be the best tool for this - you could do it directly from Bash, for example, with just head -n20 csvfile.csv > output.txt.

Upvotes: 2

Related Questions