Reputation: 3841
I am facing a strange question, which despite of trying many times, i am not able to find the logic and proper code to the problem.
I have a file in the format below:
aa:bb:cc dd:ee:ff 100 ---------->line1
aa:bb:cc dd:ee:ff 101 ---------->line2
dd:ee:ff aa:bb:cc 230 ---------->line3
dd:ee:ff aa:bb:cc 231 ---------->line4
dd:ee:ff aa:bb:cc 232 ---------->line5
aa:bb:cc dd:ee:ff 102 ---------->line6
aa:bb:cc dd:ee:ff 103 ---------->line7
aa:bb:cc dd:ee:ff 108 ---------->line8
dd:ee:ff aa:bb:cc 233 ---------->line9
gg:hh:ii jj:kk:ll 450 ---------->line10
jj:kk:ll gg:hh:ii 600 ---------->line11
My program should read the file line by line. In the first line and second line, corresponding column1 and column2 values are equal. Third column is the sequence number which is not the same for any two lines.
Since line1 and line2 are same except, their sequence numbers differ by value of only 1, i should read those two lines first and write their count as 2 to an output file. If we observe, line 6 and line 7 are same as line 1 and line 2, having consecutive sequence numbers, but the line numbers line3, line4, line5 having different column 1 and column 2 entries came in between them. Hence lines(1&2) and lines(6&7) should not be grouped all together. So, in the output file, i should get result like 2 3 2 1 1 1 1. And one more thing is, lines 7 and 8 are differed by sequence number more than 1. Hence, line 8 should be counted as a separate entry, not together with lines 6 and 7 though lines 6,7,8 has same first two columns.
I hope most people understood the question. If not, i will clarify anything on the question.
As you can see this is a very complicated problem. I tried using dictionary as that is the only data structure i know, but no logic works. Please help me solve this problem.
Upvotes: 0
Views: 114
Reputation: 250871
with open("abc") as f:
#read the first line and set the number from it as the value of `prev`
num, col4 = next(f).rsplit(None,2)[-2:] #use `str.rsplit` for minimum splits
prev = int(num)
col4_prev = col4
count = 1 #initialize `count` to 1
for lin in f:
num, col4 = lin.rsplit(None,2)[-2:]
num = int(num)
if num - prev == 1: #if current `num` - `prev` == 1
count+=1 # increment `count`
prev = num # set `prev` = `num`
else:
print count,col4_prev #else print `count` or write it to a file
count = 1 #reset `count` to 1
prev = num #set `prev` = `num`
col4_prev = col4
if num - prev != 1:
print count,col4
output:
2 400
3 600
2 400
1 111
1 500
1 999
1 888
Where 'abc' contains:
aa:bb:cc dd:ee:ff 100 400
aa:bb:cc dd:ee:ff 101 400
dd:ee:ff aa:bb:cc 230 600
dd:ee:ff aa:bb:cc 231 600
dd:ee:ff aa:bb:cc 232 600
aa:bb:cc dd:ee:ff 102 400
aa:bb:cc dd:ee:ff 103 400
aa:bb:cc dd:ee:ff 108 111
dd:ee:ff aa:bb:cc 233 500
gg:hh:ii jj:kk:ll 450 999
jj:kk:ll gg:hh:ii 600 888
Upvotes: 1
Reputation: 41940
You could use itertools.groupby()
...
from cStringIO import StringIO
import itertools
data = 'aa:bb:cc dd:ee:ff 100\n' \
'aa:bb:cc dd:ee:ff 101\n' \
'dd:ee:ff aa:bb:cc 230\n' \
'dd:ee:ff aa:bb:cc 231\n' \
'dd:ee:ff aa:bb:cc 232\n' \
'aa:bb:cc dd:ee:ff 102\n' \
'aa:bb:cc dd:ee:ff 103\n' \
'aa:bb:cc dd:ee:ff 108\n' \
'dd:ee:ff aa:bb:cc 233\n' \
'gg:hh:ii jj:kk:ll 450\n' \
'jj:kk:ll gg:hh:ii 600\n'
sio = StringIO(data)
print [len(list(g)) for k, g in itertools.groupby(sio, key=lambda x, c=itertools.count(): (x[:-5], int(x[-4:-1])-next(c)))]
...which prints...
[2, 3, 2, 1, 1, 1, 1]
Upvotes: 0
Reputation: 17971
entries = open('filename.txt', 'r')
prevLine = ""
count = 1
for line in entries:
if line == prevLine:
count += 1
else:
print count
count = 1
prevLine = line
That should do it, here's an explanation: First you open the file then you loop over each line of the file for each line you compare it to the previous one if it is the same as the previous one, you add one to the matches counter if it is not the same, you print the output and reset the counter at the end of the loop you save your previous line
Upvotes: 0
Reputation: 26911
from collections import defaultdict
results = defaultdict(int)
for line in open("input_file.txt", "r"):
columns = line.split(" ")
key = " ".join(columns[:2])
results[key] += 1
with output_file = open("output_file.txt", "w"):
for key, count in results:
output_file.write("{0} -> {1}".format(key, count))
Upvotes: 0