Reputation: 65

More efficient way to go through .csv file?

I'm trying to parse through a few dictionary a in .CSV file, using two lists in separate .txt files so that the script knows what it is looking for. The idea is to find a line in the .CSV file which matches both a Word and IDNumber, and then pull out a third variable if there is a match. However, the code is running really slow. Any ideas how I could make it more efficient?

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

for CurrentIDNumber in open(IDNumberList_filename).readlines():
    for CurrentWord in open(WordsOfInterest_filename).readlines():
        FoundCurrent = 0

        with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
            reader = csv.DictReader(csvfile)
            for row in reader:
                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

            if FoundCurrent == 0:
                CurrentProportion=0
            else:
                CurrentProportion=1
                print('found')

Upvotes: 0

Answers (3)

Bernard

Reputation: 301

Your are opening the CSV file N times where N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt). If the file is not too large, you can avoid that by saving its content to a dictionary or a list of lists.

The same way you open dictionary_WordsOfInterest.txt every time you read a new line from IDS.txt

Also It seems that you are looking for any combination of pair (CurrentIDNumber, CurrentWord) possible from the txt files. So for example you can store the ids in a set, and the words in an other, and for each row in the csv file, you can check if both the id and the word are in their respective set.

Upvotes: 1

Serge Ballesta

Reputation: 149155

As you use readlines for the .txt files, you already build an in memory list with them. You should build those lists first and them only parse once the csv file. Something like:

import csv

IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'

WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')

numberlist = open(IDNumberList_filename).readlines():
wordlist =  open(WordsOfInterest_filename).readlines():

FoundCurrent = 0

with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        for CurrentIDNumber in numberlist:
            for CurrentWord in wordlist :

                if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
                    FoundCurrent = 1
                    CurrentProportion= row['CurrentProportion']

                if FoundCurrent == 0:
                    CurrentProportion=0
                else:
                    CurrentProportion=1
                    print('found')

Beware: untested

Upvotes: 1

Dmitriy Sorochenkov

Reputation: 114

First of all, consider to load file dictionary_individualwords.csv into the memory. I guess that python dictionary is proper data structure for this case.

Upvotes: 2

More efficient way to go through .csv file?

Answers (3)

Related Questions