GNMO11
GNMO11

Reputation: 2259

Python matching words with same index in string

I have two strings of equal length and want to match words that have the same index. I am also attempting to match consecutive matches which is where I am having trouble.

For example I have two strings

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

What I am looking for is to get the result:

['I am','show']

My current code is as follow:

keys = []
for x in alligned1.split():
    for i in alligned2.split():
        if x == i:
            keys.append(x)

Which gives me:

['I','am','show']

Any guidance or help would be appreciated.

Upvotes: 8

Views: 802

Answers (4)

Matt Davidson
Matt Davidson

Reputation: 738

A simplification of your code would be:

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        keys.append(word)

We then need to track if we have just matched a word, let's do it with a flag variable.

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

keys = []
prev = ''
for i, word in enumerate(alligned1.split()): 
    if word == alligned2.split()[i]:
        prev = prev + ' ' + word if prev else word

    elif prev:
        keys.append(prev)
        prev = ''

Upvotes: 3

Bhargav Rao
Bhargav Rao

Reputation: 52191

Well Kevin's answer is the best and spot on. I tried to do it teh brute force way. It does not look good, but does the job, without any imports

alligned1 = 'I am going to go to some show'.split(' ')
alligned2 = 'I am not going to go the show'.split(' ')
keys = []
temp = [v if v==alligned1[i] else None for i,v in enumerate(alligned2) ]
temp.append(None)
tmpstr = ''
for i in temp:
    if i:
        tmpstr+=i+' '
    else:
        if tmpstr: keys.append(tmpstr)
        tmpstr = ''
keys =  [i.strip() for i in keys]
print keys

Output

['I am', 'show']

Upvotes: 1

Don
Don

Reputation: 17636

Maybe not very elegant, but it works:

from itertools import izip_longest

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

curr_match = ''
matches = []
for w1, w2 in izip_longest(alligned1.split(), alligned2.split()):
    if w1 != w2:
        if curr_match:
            matches.append(curr_match)
            curr_match = ''
        continue
    if curr_match:
        curr_match += ' '
    curr_match += w1
if curr_match:
    matches.append(curr_match)

print matches

result:

['I am', 'show']

Upvotes: 0

Kevin
Kevin

Reputation: 76254

Finding matching words is fairly simple, but putting them in contiguous groups is fairly tricky. I suggest using groupby.

import itertools

alligned1 = 'I am going to go to some show'
alligned2 = 'I am not going to go the show'

results = []
word_pairs = zip(alligned1.split(), alligned2.split())
for k, v in itertools.groupby(word_pairs, key = lambda pair: pair[0] == pair[1]):
    if k: 
        words = [pair[0] for pair in v]
        results.append(" ".join(words))

print results

Result:

['I am', 'show']

Upvotes: 10

Related Questions