oldboy
oldboy

Reputation: 5954

Count Not Counting Properly

I've scraped a bunch of words from the dictionary, and created a massive CSV file with all of them, one word per row.

I have another function, which reads from that massive CSV file, and then creates smaller CSV files.

The function is supposed to create CSV files with only 500 words/rows, but something is amiss. The first file has 501 words/rows. The rest of the files have 502 words/rows.

Man, maybe I'm tired, but I can't seem to spot what exactly is causing this in my code. Or is there nothing wrong with my code at all?

Below is the part of the function that I'm assuming is causing the problem. The full function can be seen below that.

Suspect Part of Function

def create_csv_files():
  limit = 500
  count = 0
  filecount = 1
  zfill = 3
  filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
  with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
    csvReader = csv.reader(readfile)
    for row in csvReader:
      term = row[0]
      if ' ' in term:
        term = term.replace(' ', '')
      if count <= limit:
        count += 1
      else:
        count = 0
        filecount += 1
        filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
      aw = 'a' if os.path.exists(filename) else 'w'
      with open(filename, aw, newline='') as writefile:
        fieldnames = [ 'term' ]
        writer = csv.DictWriter(writefile, fieldnames=fieldnames)
        writer.writerow({
          'term': term
        })

The Whole Function

def create_csv_files():
  limit = 500
  count = 0
  filecount = 1
  zfill = 3
  idiomsfilename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\idioms.csv'
  filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
  with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
    csvReader = csv.reader(readfile)
    for row in csvReader:
      term = row[0]
      if 'idiom' in row[0] and row[0] != ' idiom':
        term = row[0][:-5]
        aw = 'a' if os.path.exists(idiomsfilename) else 'w'
        with open(idiomsfilename, aw, newline='') as idiomsfile:
          idiomsfieldnames = ['idiom']
          idiomswriter = csv.DictWriter(idiomsfile, fieldnames=idiomsfieldnames)
          idiomswriter.writerow({
            'idiom':term
          })
        continue
      else:
        if ' ' in term:
          term = term.replace(' ', '')
        if count <= limit:
          count += 1
        else:
          count = 0
          filecount += 1
          filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
        aw = 'a' if os.path.exists(filename) else 'w'
        with open(filename, aw, newline='') as writefile:
          fieldnames = [ 'term' ]
          writer = csv.DictWriter(writefile, fieldnames=fieldnames)
          writer.writerow({
            'term': term
          })
      print(term)

Upvotes: 0

Views: 174

Answers (1)

absolutelydevastated
absolutelydevastated

Reputation: 1747

So the reason why the files have weird number of rows is because of your if-else conditions.

You increment count when count is less than or equal to limit. For your very first iteration, you increment to 1 then write your first term, then increment and so on. Because you use <= instead of the strict inequality, you will still increment at count = 500 and write the 501st word.

From the second loop onwards, your first word is written at count = 0. The loop terminates again at count = 501 so you write 502 words this time.

To fix this, check for count >= limit, and create a new file if so. Increment count after you write to the CSV file and not before. That should help.

def create_csv_files():
  limit = 500
  count = 0
  filecount = 1
  zfill = 3
  filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format('1'.zfill(zfill))
  with open('C:\\Users\\Anthony\\Desktop\\Scrape\\Results\\dictionary.csv') as readfile:
    csvReader = csv.reader(readfile)
    for row in csvReader:
      term = row[0]
      if ' ' in term:
        term = term.replace(' ', '')
      # Remove if and keep else
      if count >= limit:
        count = 0
        filecount += 1
        filename = 'C:\\Users\\Anthony\\Desktop\\Scrape\\Dictionary\\terms{}.csv'.format(str(filecount).zfill(zfill))
      aw = 'a' if os.path.exists(filename) else 'w'
      with open(filename, aw, newline='') as writefile:
        fieldnames = [ 'term' ]
        writer = csv.DictWriter(writefile, fieldnames=fieldnames)
        writer.writerow({
          'term': term
        })
        count += 1 # Increment here

Upvotes: 2

Related Questions