user3319356
user3319356

Reputation: 173

python, compare two files and get difference

I have two files, one is user input f1, and other one is database f2.I want to search if strings from f1 are in database(f2). If not print the ones that don't exist if f2. I have problem with my code, it is not working fine: Here is f1:

rbs003491
rbs003499
rbs003531
rbs003539
rbs111111

Here is f2:

AHPTUR13,rbs003411 
AHPTUR13,rbs003419 
AHPTUR13,rbs003451 
AHPTUR13,rbs003459 
AHPTUR13,rbs003469 
AHPTUR13,rbs003471 
AHPTUR13,rbs003479 
AHPTUR13,rbs003491 
AHPTUR13,rbs003499 
AHPTUR13,rbs003531 
AHPTUR13,rbs003539 
AHPTUR13,rbs003541 
AHPTUR13,rbs003549 
AHPTUR13,rbs003581 

In this case it would return rbs11111, because it is not in f2. Code is:

 with open(c,'r') as f1:
             s1 = set(x.strip() for x in f1)
             print s1
             with open("/tmp/ARNE/blt",'r') as f2:
                  for line in f2:
                      if line not in s1:
                          print line 

Upvotes: 0

Views: 135

Answers (3)

falsetru
falsetru

Reputation: 368904

If you only care about the second part of each line (rbs003411 from AHPTUR13,rbs003411):

with open(user_input_path) as f1, open('/tmp/ARNE/blt') as f2:
    not_found = set(f1.read().split())
    for line in f2:
        _, found = line.strip().split(',')
        not_found.discard(found)  # remove found word
    print not_found
    # for x in not_found:
    #     print x

Upvotes: 1

Kasravnd
Kasravnd

Reputation: 107287

you need to check the last part of your lines not all of them , you can split your lines from f2 with , then choose the last part (x.strip().split(',')[-1]) , Also if you want to search if strings from f1 are in database(f2) your LOGIC here is wrong you need to create your set from f2 :

with open(c,'r') as f1,open("/tmp/ARNE/blt",'r') as f2:

                  s1 = set(x.strip().split(',')[-1] for x in f2)
                  print s1
                  for line in f1:
                      if line.strip() not in s1:
                          print line

Upvotes: 0

legaultmarc
legaultmarc

Reputation: 127

Your line variable in the for loop will contain something like "AHPTUR13,rbs003411", but you are only interested in the second part. You should do something like:

for line in f2:
    line = line.strip().split(",")[1]
    if line not in s1:
        print line

Upvotes: 0

Related Questions