Reputation: 345
I'd like to count specific things from a file, i.e. how many times "--undefined--"
appears. Here is a piece of the file's content:
"jo:ns 76.434
pRE 75.417
zi: 75.178
dEnt --undefined--
ba --undefined--
I tried to use something like this. But it won't work:
with open("v3.txt", 'r') as infile:
data = infile.readlines().decode("UTF-8")
count = 0
for i in data:
if i.endswith("--undefined--"):
count += 1
print count
Do I have to implement, say, dictionary of tuples to tackle this or there is an easier solution for that?
EDIT:
The word in question appears only once in a line.
Upvotes: 1
Views: 5276
Reputation: 5372
Quoting Raymond Hettinger, "There must be a better way":
from collections import Counter
counter = Counter()
words = ('--undefined--', 'otherword', 'onemore')
with open("v3.txt", 'r') as f:
lines = f.readlines()
for line in lines:
for word in words:
if word in line:
counter.update((word,)) # note the single element tuple
print counter
Upvotes: 1
Reputation: 77902
When reading a file line by line, each line ends with the newline character:
>>> with open("blookcore/models.py") as f:
... lines = f.readlines()
...
>>> lines[0]
'# -*- coding: utf-8 -*-\n'
>>>
so your endswith()
test just can't work - you have to strip the line first:
if i.strip().endswith("--undefined--"):
count += 1
Now reading a whole file in memory is more often than not a bad idea - even if the file fits in memory, it still eats fresources for no good reason. Python's file
objects are iterable, so you can just loop over your file. And finally, you can specify which encoding should be used when opening the file (instead of decoding manually) using the codecs
module (python 2) or directly (python3):
# py3
with open("your/file.text", encoding="utf-8") as f:
# py2:
import codecs
with codecs.open("your/file.text", encoding="utf-8") as f:
then just use the builtin sum
and a generator expression:
result = sum(line.strip().endswith("whatever") for line in f)
this relies on the fact that booleans are integers with values 0
(False
) and 1
(True
).
Upvotes: 1
Reputation: 6376
you can read all the data in one string and split the string in a list, and count occurrences of the substring in that list.
with open('afile.txt', 'r') as myfile:
data=myfile.read().replace('\n', ' ')
data.split(' ').count("--undefined--")
or directly from the string :
data.count("--undefined--")
Upvotes: 3
Reputation: 369
Or don't limit yourself to .endswith()
, use the in
operator.
data = ''
count = 0
with open('v3.txt', 'r') as infile:
data = infile.readlines()
print(data)
for line in data:
if '--undefined--' in line:
count += 1
count
Upvotes: 1
Reputation: 483
readlines()
returns the list of lines, but they are not stripped (ie. they contain the newline character).
Either strip them first:
data = [line.strip() for line in data]
or check for --undefined--\n
:
if line.endswith("--undefined--\n"):
Alternatively, consider string's .count()
method:
file_contents.count("--undefined--")
Upvotes: 1