Reputation: 65
I'm trying to parse through a few dictionary a in .CSV file, using two lists in separate .txt files so that the script knows what it is looking for. The idea is to find a line in the .CSV file which matches both a Word and IDNumber, and then pull out a third variable if there is a match. However, the code is running really slow. Any ideas how I could make it more efficient?
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
for CurrentIDNumber in open(IDNumberList_filename).readlines():
for CurrentWord in open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
Upvotes: 0
Views: 496
Reputation: 301
Your are opening the CSV file N
times where N = (# lines in IDS.txt) * (# lines in dictionary_WordsOfInterest.txt)
. If the file is not too large, you can avoid that by saving its content to a dictionary or a list of lists.
The same way you open dictionary_WordsOfInterest.txt
every time you read a new line from IDS.txt
Also It seems that you are looking for any combination of pair (CurrentIDNumber, CurrentWord) possible from the txt files. So for example you can store the ids in a set, and the words in an other, and for each row in the csv file, you can check if both the id and the word are in their respective set.
Upvotes: 1
Reputation: 149155
As you use readlines for the .txt files, you already build an in memory list with them. You should build those lists first and them only parse once the csv file. Something like:
import csv
IDNumberList_filename = 'IDs.txt'
WordsOfInterest_filename = 'dictionary_WordsOfInterest.txt'
Dictionary_filename = 'dictionary_individualwords.csv'
WordsOfInterest_ReadIn = open(WordsOfInterest_filename).read().split('\n')
#IDNumberListtoRead = open(IDNumberList_filename).read().split('\n')
numberlist = open(IDNumberList_filename).readlines():
wordlist = open(WordsOfInterest_filename).readlines():
FoundCurrent = 0
with open(Dictionary_filename, newline='', encoding='utf-8') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
for CurrentIDNumber in numberlist:
for CurrentWord in wordlist :
if ((row['IDNumber'] == CurrentIDNumber) and (row['Word'] == CurrentWord)):
FoundCurrent = 1
CurrentProportion= row['CurrentProportion']
if FoundCurrent == 0:
CurrentProportion=0
else:
CurrentProportion=1
print('found')
Beware: untested
Upvotes: 1
Reputation: 114
First of all, consider to load file dictionary_individualwords.csv into the memory. I guess that python dictionary is proper data structure for this case.
Upvotes: 2