Reputation: 21

Finding matches in python 3

I have two files with lots of columns and different information about a bunch of objects, that come with object IDs. I need to find matches between the two files, but the object IDs come in two different formats:

12-12-1 in one file will be written as 0012 00012 1 in the other. For instance, in one file I have:

0001 01531 1
0001 01535 1
0001 01538 1

Which corresponds to this in the other:

1-1531-1
1-1535-1
1-1538-1

Something as simple as

matches = open('matches.dat','w')
for j in range(len(file1)):
    for i in range(len(file2)):
        if file1[j] == file2[i]:
            matches.write('{}/n'.format(file1[j]))

doesn't seem to do the trick.

file1 and file2 here are lists that contain all the object IDs from the different files.

What do I add to my code to find the matches?

Upvotes: 2

Answers (2)

Seth

Reputation: 43

A few notes:

-You don't close your matches file at the end of your code. Using with will automatically take care of file cleanup.

-Your newline character in the last line of your code isn't escaped properly - it's \n, not /n.

If your numeric formatting is always constant (i.e. the first column is always padded to four values, the second is always padded to 5, and the last is never padded), this should work:

with open('matches.dat', 'w') as matches:
    for j in range(len(file1)):
        for i in range(len(file2)):
            match_list = file2[i].split('-')
            match_str = '{} {} {}'.format(match_list[0].zfill(4), match_list[1].zfill(5), match_list[2])
            if file1[j] == match_str:
                matches.write('{}\n'.format(file1[j]))

Upvotes: 0

c2huc2hu

Reputation: 2497

Converting your first format to the second:

import re

def convert(word):
    word = word.strip().replace(' ', '-')
    return re.sub('\\b0+', '', word)  # strip all 0s after a word boundary (space or beginning of line)

Algorithmic Improvement

You can calculate the intersection in O(n+m) time by converting both to a list and computing the intersection

file1_ids = {convert(line) for line in file1}
file2_ids = {line for line in file2}

matches = file1_ids.intersection(file2_ids)

Upvotes: 1

Finding matches in python 3

Answers (2)

Converting your first format to the second:

Algorithmic Improvement

Related Questions