connor449
connor449

Reputation: 1679

How to edit 2 lists so that they match in python

I have two lists, a and b. They look like this:

a = [
 'And',
 "you're",
 'going',
 'to',
 'use',
 'some',
 'handouts.',
 'Okay.',
 'So',
 'I',
 'needed',
 'to',
 'know',
 'and',
 'for,'
]
b = [
 'And',
 "you're",
 'going',
 'to',
 'use',
 'some',
 'handouts.',
 'Okay.',
 'I',
 'needed',
 'to',
 'know',
 'and',
 'for,',
 'it'
]

I want to ensure that they can zip together and match. However, they do not as can be seen here:

x = list(zip(a,b))
for i in x:
    print(i)

('And', 'And')
("you're", "you're")
('going', 'going')
('to', 'to')
('use', 'use')
('some', 'some')
('handouts.', 'handouts.')
('Okay.', 'Okay.')
---> ('So', 'I')
('I', 'needed')
('needed', 'to')
('to', 'know')
('know', 'and')
('and', 'for,')
('for,', 'it')

It can be seen that a contains 'So' and b does not. To fix this, I want to drop 'So' from a, which would result in this:

('And', 'And')
("you're", "you're")
('going', 'going')
('to', 'to')
('use', 'use')
('some', 'some')
('handouts.', 'handouts.')
('Okay.', 'Okay.')
('I', 'I')
('needed', 'needed')
('to', 'to')
('know', 'know')
('and', 'and')
('for', 'for,')
('it,', 'it')

Essentially, I a word exists in one list but not the other list within the general index area, I want to remove it, regardless if it is in a or b. I have used the fuzzywuzzy library for fuzzy matching, which does decently well, but it is very slow. Are there more efficient ways to do this?

Upvotes: 0

Views: 55

Answers (3)

Booboo
Booboo

Reputation: 44148

The idea is to remove from a those items which are not in b and vice versa. Using sets are the way to compute this efficiently for large lists:

a = [
 'And',
 "you're",
 'going',
 'to',
 'use',
 'some',
 'handouts.',
 'Okay.',
 'So',
 'I',
 'needed',
 'to',
 'know',
 'and',
 'for,'
]

b = [
 'And',
 "you're",
 'going',
 'to',
 'use',
 'some',
 'handouts.',
 'Okay.',
 'I',
 'needed',
 'to',
 'know',
 'and',
 'for,',
 'it'
]

set_a = set(a)
set_b = set(b)
remove_a = set_a - set_b
for item in remove_a:
    a.remove(item)
remove_b = set_b - set_a
for item in remove_b:
    b.remove(item)
x = list(zip(a,b))
for item in x:
    print(item)

Prints:

('And', 'And')
("you're", "you're")
('going', 'going')
('to', 'to')
('use', 'use')
('some', 'some')
('handouts.', 'handouts.')
('Okay.', 'Okay.')
('I', 'I')
('needed', 'needed')
('to', 'to')
('know', 'know')
('and', 'and')
('for,', 'for,')

Upvotes: 1

Hikash
Hikash

Reputation: 429

I don't know if this would be faster, but I think you could just use two list comprehensions:

original_a = [ 'And', "you're", 'going' ] # etc
original_b = [ 'And', "you're", 'going' ] # etc
common_a = [x for x in original_a if x in original_b]
common_b = [x for x in original_b if x in original_a]
zipped_result = zip(common_a, common_b)

this should preserve order and I think get you what you want.

Upvotes: 1

Chris
Chris

Reputation: 16147

c = set(a) & set(b)
# if the order does not matter
list(zip(c,c))
# If the order does matter
list(zip([x for x in a if x in c],
         [x for x in b if x in c]))

Upvotes: 1

Related Questions