Reputation: 418
python newb here - I'm trying to format a set of really gross csv's I was sent so that I can throw them into a nice postgres table for querying and analysis. In order to do this, I first cleanse them using csv.writer to remove the blank rows and double quotes that wrap each entry. Here's what my code looks like:
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
It works perfectly fine, and does exactly what I want it to do. The output csv looks great. Next, I try to essentially chop off a certain amount of rows that contain completely unnecessary junk from both the beginning and end of the newly cleansed csv files (omit the first 8 rows and the last 2). For a reason that I truly cannot ascertain, the csv's output from this portion of the code (indented the same as the earlier 'with' block) are completely empty:
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
I know that because of my 'with' usage, the close() at the end of each block is redundant - I tried it as an approach after looking here. I also tried putting the second 'with' block into a different file and running that after running the first 'with' block, but still to no avail. Your help is greatly appreciated!
Also, here's the whole file:
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in reader2)
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
Thanks!
Upvotes: 1
Views: 85
Reputation: 16224
You can quickly fix the code like this (I commented the line with the problem, as @Hugh Bothwell said, you already read all the data from the variable reader2
):
import os
import csv
import glob
from itertools import islice
files = glob.glob('/Users/foo/bar/*.csv')
# Loop through all of the csv's
for file in files:
# Get the filename from the path
outfile = os.path.basename(file)
with open(file, 'rb') as inp, open('/Users/foo/baz/' + outfile, 'wb') as out:
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
if row:
writer.writerow(row)
out.close()
with open('/Users/foo/baz/' + outfile, 'rb') as inp2, open('/Users/foo/qux/' + outfile, 'wb') as out2:
writer2 = csv.writer(out2)
reader2 = csv.reader(inp2)
row_count = sum(1 for row in csv.reader(inp2)) #here you separately count the amount of rows without read the variable reader2
last_line_index = row_count - 3
for row in islice(reader2, 7, last_line_index):
writer2.writerow(row)
out2.close()
Upvotes: 1
Reputation: 56634
The guilty party is
row_count = sum(1 for row in reader2)
it reads all the data from reader2
; now when you try to for row in islice(reader2, 7, last_line_index)
you don't get any data.
Also, you are probably reading a lot of blank rows because you open the file as binary; instead do
with open('file.csv', newline='') as inf:
rd = csv.reader(inf)
Upvotes: 2