Justin Carrey
Justin Carrey

Reputation: 3841

Challenging way of counting entries of a file dynamically

I am facing a strange question, which despite of trying many times, i am not able to find the logic and proper code to the problem.

I have a file in the format below:

aa:bb:cc dd:ee:ff 100  ---------->line1
aa:bb:cc dd:ee:ff 101  ---------->line2
dd:ee:ff aa:bb:cc 230  ---------->line3
dd:ee:ff aa:bb:cc 231  ---------->line4
dd:ee:ff aa:bb:cc 232  ---------->line5
aa:bb:cc dd:ee:ff 102  ---------->line6
aa:bb:cc dd:ee:ff 103  ---------->line7
aa:bb:cc dd:ee:ff 108  ---------->line8
dd:ee:ff aa:bb:cc 233  ---------->line9  
gg:hh:ii jj:kk:ll 450  ---------->line10
jj:kk:ll gg:hh:ii 600  ---------->line11

My program should read the file line by line. In the first line and second line, corresponding column1 and column2 values are equal. Third column is the sequence number which is not the same for any two lines.
Since line1 and line2 are same except, their sequence numbers differ by value of only 1, i should read those two lines first and write their count as 2 to an output file. If we observe, line 6 and line 7 are same as line 1 and line 2, having consecutive sequence numbers, but the line numbers line3, line4, line5 having different column 1 and column 2 entries came in between them. Hence lines(1&2) and lines(6&7) should not be grouped all together. So, in the output file, i should get result like 2 3 2 1 1 1 1. And one more thing is, lines 7 and 8 are differed by sequence number more than 1. Hence, line 8 should be counted as a separate entry, not together with lines 6 and 7 though lines 6,7,8 has same first two columns.
I hope most people understood the question. If not, i will clarify anything on the question.
As you can see this is a very complicated problem. I tried using dictionary as that is the only data structure i know, but no logic works. Please help me solve this problem.

Upvotes: 0

Views: 114

Answers (4)

Ashwini Chaudhary
Ashwini Chaudhary

Reputation: 250871

with open("abc") as f:
    #read the first line and set the number from it as the value of `prev`
    num, col4 = next(f).rsplit(None,2)[-2:] #use `str.rsplit` for minimum splits
    prev  = int(num)
    col4_prev = col4
    count = 1                               #initialize `count` to 1
    for lin in f:
        num, col4 = lin.rsplit(None,2)[-2:]
        num  = int(num)                    
        if num - prev == 1:             #if current `num` - `prev` == 1
            count+=1                        # increment `count` 
            prev = num                      # set `prev` = `num`
        else:
            print count,col4_prev       #else print `count` or write it to a file 
            count = 1                       #reset `count` to 1
            prev = num                      #set `prev` = `num`
            col4_prev = col4

    if num - prev != 1:
        print count,col4

output:

2 400
3 600
2 400
1 111
1 500
1 999
1 888

Where 'abc' contains:

aa:bb:cc dd:ee:ff 100 400
aa:bb:cc dd:ee:ff 101 400 
dd:ee:ff aa:bb:cc 230 600 
dd:ee:ff aa:bb:cc 231 600
dd:ee:ff aa:bb:cc 232 600
aa:bb:cc dd:ee:ff 102 400
aa:bb:cc dd:ee:ff 103 400
aa:bb:cc dd:ee:ff 108 111 
dd:ee:ff aa:bb:cc 233 500 
gg:hh:ii jj:kk:ll 450 999
jj:kk:ll gg:hh:ii 600 888 

Upvotes: 1

Aya
Aya

Reputation: 41940

You could use itertools.groupby()...

from cStringIO import StringIO
import itertools

data = 'aa:bb:cc dd:ee:ff 100\n' \
       'aa:bb:cc dd:ee:ff 101\n' \
       'dd:ee:ff aa:bb:cc 230\n' \
       'dd:ee:ff aa:bb:cc 231\n' \
       'dd:ee:ff aa:bb:cc 232\n' \
       'aa:bb:cc dd:ee:ff 102\n' \
       'aa:bb:cc dd:ee:ff 103\n' \
       'aa:bb:cc dd:ee:ff 108\n' \
       'dd:ee:ff aa:bb:cc 233\n' \
       'gg:hh:ii jj:kk:ll 450\n' \
       'jj:kk:ll gg:hh:ii 600\n'

sio = StringIO(data)
print [len(list(g)) for k, g in itertools.groupby(sio, key=lambda x, c=itertools.count(): (x[:-5], int(x[-4:-1])-next(c)))]

...which prints...

[2, 3, 2, 1, 1, 1, 1]

Upvotes: 0

Stephan
Stephan

Reputation: 17971

entries = open('filename.txt', 'r')
prevLine = ""
count = 1
for line in entries:
    if line == prevLine:
        count += 1
    else:
        print count
        count = 1
    prevLine = line

That should do it, here's an explanation: First you open the file then you loop over each line of the file for each line you compare it to the previous one if it is the same as the previous one, you add one to the matches counter if it is not the same, you print the output and reset the counter at the end of the loop you save your previous line

Upvotes: 0

J0HN
J0HN

Reputation: 26911

from collections import defaultdict
results = defaultdict(int)
for line in open("input_file.txt", "r"):
    columns = line.split(" ")
    key = " ".join(columns[:2])
    results[key] += 1

with output_file = open("output_file.txt", "w"):
    for key, count in results:
       output_file.write("{0} -> {1}".format(key, count))

Upvotes: 0

Related Questions