sudnyank
sudnyank

Reputation: 63

Find matching substrings in two lists

I have two lists: A and B. List lengths are not the same and they both contain strings. What is the best way to match substrings in both the lists?

list_A = ['hello','there','you','are']
list_B = ['say_hellaa','therefore','foursquare']

I would like a list of matching substrings called list_C which contains:

list_C = ['hell','there','are']

I came across this answer, but it requires me to have a list of matching substrings. Is there a way I can get what I want without manually creating a list of matching substrings?

This also does not help me cause the second list contains substrings.

Upvotes: 4

Views: 3073

Answers (5)

Engineero
Engineero

Reputation: 12908

For funsies, here's an answer that uses regex!

import re

matches = []
for pat in list_B:
    matches.append(re.search(pat, ' '.join(list_A)))
matches = [mat.group() for mat in matches if mat]
print(matches)
# ['hell', 'here']

This returns a match object for each match that is found, the actual string of which is found by match.group(). Note that if no match is found (as is the case for the second element in your list_B), you get a None in matches, thus the need to add the if mat at the end of the list comprehension.

Upvotes: 0

piRSquared
piRSquared

Reputation: 294218

IIUC: I'd use Numpy

import numpy as np
from numpy.core.defchararray import find

a = np.array(['hello', 'there', 'you', 'are', 'up', 'date'])
b = np.array(['hell', 'is', 'here', 'update'])

bina = b[np.where(find(a[:, None], b) > -1)[1]]
ainb = a[np.where(find(b, a[:, None]) > -1)[0]]

np.append(bina, ainb)

array(['hell', 'here', 'up', 'date'], dtype='<U6')

Upvotes: 1

Lyux
Lyux

Reputation: 463

list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
List_C = []

for a in list_A:
    for b in list_B:
        print(a,"<->",b)
        if a in b:
            List_C.append(a)
        if b in a:
            List_C.append(b)

print(List_C)

Upvotes: 0

BENY
BENY

Reputation: 323226

Since you tag pandas solution from str.contains

#S_A=pd.Series(list_A)
#S_B=pd.Series(list_B)

S_B[S_B.apply(lambda x : S_A.str.contains(x)).any(1)]
Out[441]: 
0    hell
2    here
dtype: object

Upvotes: 2

Rakesh
Rakesh

Reputation: 82755

This is one approach. Using a list comprehension.

list_A = ['hello','there','you','are']
list_B = ['hell','is','here']
jVal = "|".join(list_A)        # hello|there|you|are

print([i for i in list_B if i in jVal ])

Output:

['hell', 'here']

Upvotes: 3

Related Questions