Reputation: 35
I am trying to count entries in a text file but having difficulty. The key is that each line is one entry and if the term "ADALIMUMAB" shows up in the line, it counts as one. If it shows up twice, it still should only count as one. Here is an example of lines in the text file.
101700392$10170039$3$I$BUDESONIDE.$BUDESONIDE$1$Oral$9 MG, DAILY$$$$$$$$9$MG$$
101700392$10170039$4$C$ADALIMUMAB$ADALIMUMAB$1$$UNK$$$$$$$$$$$
102117144$10211714$1$PS$HUMIRA$ADALIMUMAB$1$Subcutaneous$$$$$N$ NOT AVAILABLE,NOT
I currently have this working:
fDRUG14Q3 = open("DRUG14Q3.txt")
data = fDRUG14Q3.read()
occurencesDRUG14Q3 = data.count("ADALIMUMAB")
But it will count line 2 in the example above as 2 entries rather than one.
Upvotes: 1
Views: 40
Reputation: 92461
You can use a generator expression passed to sum()
. Each line will either be True(1) of False(0) and you'll take the total count. Basically you are counting how many lines return True
for 'ADALIMUMAB' in line
:
with open(path, 'r') as f:
total = sum('ADALIMUMAB' in line for line in f)
print(total)
# 2
This has the added benefit of not requiring you to read the whole file into memory first too.
Upvotes: 1