manxing
manxing

Reputation: 3325

compare values in a file python

Here is my data sample in a txt file:

1322484979.322313000    85.24.168.19    QQlb-j7itDQ
1322484981.070116000    83.233.56.133   Ne8Bb1d5oyc
1322484981.128791000    83.233.56.133   Ne8Bb1d5oyc
1322484981.431075000    83.233.56.133   Ne8Bb1d5oyc
1322484985.210652000    83.233.57.136   QWUiCAE4E7U

The first column is timestamp, second column is IP address, third one is some hash value.

I want to check, if two or more successive rows have same IP address and hash value, I need to use the last timestamp of the duplicated row to substract the first timestamp of the duplicated row, in this case, is 132248981.431075000-1322484981.070116000

If the result is less than 5, I will only keep the first row (the earliest) in the file.

If the result is more than 5, I will keep the first and the last duplicated row, delete rows between them

Since Im a pretty newbie of python, This problem is a bit complicated for me. I dont know what kind of function is needed, can anyone help a little bit?

Upvotes: 2

Views: 177

Answers (1)

Cédric Julien
Cédric Julien

Reputation: 80811

In a basic way, it could looks like this :

data = open("data.txt", "r")

last_time = 0.0
last_ip = None
last_hash = None

for line in data:
        timestamp, ip, hash_value = line.split()
        if ip==last_ip and hash_value==last_hash and float(timestamp) - float(last_time) < 5.0:
                print "Remove ", line
        else:
                print "Keep ", line
        last_time, last_ip, last_hash = timestamp, ip, hash_value

Upvotes: 3

Related Questions