Reputation: 1
I am trying to get group by and count in python. It does not seem to group for some reason
Using python 2.7
#!/usr/bin/env python
counts = {}
logfile = open("/tmp/test.out", "r")
for line in logfile:
if line.startswith("20") in line:
seq = line.strip()
substr = seq[0:13]
if substr not in counts:
counts[substr] = 0
counts[substr] += 1
for substr, count in counts.items():
print(count,substr)
I would like output like below grouped by count
6 2019-06-17T00
13 2019-06-17T01
9 2019-06-17T02
7 2019-06-17T03
6 2019-06-17T04
Upvotes: 0
Views: 185
Reputation: 662
You have the substring incrementing indented one block too far
for line in logfile:
if line.startswith("20") in line:
seq = line.strip()
substr = seq[0:13]
if substr not in counts:
counts[substr] = 0
# Un-indented below
counts[substr] += 1
# Print output only after loop completes
for substr, count in counts.items():
print(count,substr)
Before you would only do the increment if the substring was not in the count dictionary.
Upvotes: 2
Reputation: 67
counts = {}
logfile = open("/tmp/test.out", "r")
for line in logfile:
if line.startswith("20") in line:
seq = line.strip()
substr = seq[0:13]
if substr not in counts:
counts[substr] = 0
counts[substr] += 1
for substr, count in counts.items():
print(count,substr)
I think this would work
Upvotes: 0