Reputation: 5954
I've scraped a bunch of words from the dictionary, and created a massive CSV file with all of them, one word per row.
I have another function, which reads from that massive CSV file, and then creates smaller CSV files.
The function is supposed to create CSV files with only 500 words/rows, but something is amiss. The first file has 501 words/rows. The rest of the files have 502 words/rows.
Man, maybe I'm tired, but I can't seem to spot what exactly is causing this in my code. Or is there nothing wrong with my code at all?
Below is the part of the function that I'm assuming is causing the problem. The full function can be seen below that.
def create_csv_files():
limit = 500
count = 0
filecount = 1
zfill = 3
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
csvReader = csv.reader(readfile)
for row in csvReader:
term = row[0]
if ' ' in term:
term = term.replace(' ', '')
if count <= limit:
count += 1
else:
count = 0
filecount += 1
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
aw = 'a' if os.path.exists(filename) else 'w'
with open(filename, aw, newline='') as writefile:
fieldnames = [ 'term' ]
writer = csv.DictWriter(writefile, fieldnames=fieldnames)
writer.writerow({
'term': term
})
def create_csv_files():
limit = 500
count = 0
filecount = 1
zfill = 3
idiomsfilename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\idioms.csv'
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
csvReader = csv.reader(readfile)
for row in csvReader:
term = row[0]
if 'idiom' in row[0] and row[0] != ' idiom':
term = row[0][:-5]
aw = 'a' if os.path.exists(idiomsfilename) else 'w'
with open(idiomsfilename, aw, newline='') as idiomsfile:
idiomsfieldnames = ['idiom']
idiomswriter = csv.DictWriter(idiomsfile, fieldnames=idiomsfieldnames)
idiomswriter.writerow({
'idiom':term
})
continue
else:
if ' ' in term:
term = term.replace(' ', '')
if count <= limit:
count += 1
else:
count = 0
filecount += 1
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
aw = 'a' if os.path.exists(filename) else 'w'
with open(filename, aw, newline='') as writefile:
fieldnames = [ 'term' ]
writer = csv.DictWriter(writefile, fieldnames=fieldnames)
writer.writerow({
'term': term
})
print(term)
Upvotes: 0
Views: 174
Reputation: 1747
So the reason why the files have weird number of rows is because of your if-else conditions.
You increment count
when count
is less than or equal to limit
. For your very first iteration, you increment to 1 then write your first term, then increment and so on. Because you use <=
instead of the strict inequality, you will still increment at count = 500
and write the 501st word.
From the second loop onwards, your first word is written at count = 0
. The loop terminates again at count = 501
so you write 502 words this time.
To fix this, check for count >= limit
, and create a new file if so. Increment count
after you write to the CSV file and not before. That should help.
def create_csv_files():
limit = 500
count = 0
filecount = 1
zfill = 3
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
csvReader = csv.reader(readfile)
for row in csvReader:
term = row[0]
if ' ' in term:
term = term.replace(' ', '')
# Remove if and keep else
if count >= limit:
count = 0
filecount += 1
filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
aw = 'a' if os.path.exists(filename) else 'w'
with open(filename, aw, newline='') as writefile:
fieldnames = [ 'term' ]
writer = csv.DictWriter(writefile, fieldnames=fieldnames)
writer.writerow({
'term': term
})
count += 1 # Increment here
Upvotes: 2