baxx
baxx

Reputation: 4725

Determine whether elements of a pandas sequence contain elements of a different sequence as substrings

Given the following:

s1 = pd.Series(["onee", "twoo", "threee", "fourr"])
s2 = pd.Series(["one", "two"])

How to find s3 as [True, True, False, False].

This is determined by, for each element in s1, if an element from s2 is a substring the corresponding element in s3 should be True.

Note - the list sizes can vary, so a solution which depends on there being a set number of elements in s2 isn't viable.

I have the following, which I think works, but I don't think is a very nice solution

s1 = pd.Series(["onee", "twoo", "threee", "fourr"])
s2 = pd.Series(["one", "two"])

res = []
for s_2 in s2:
    for s_1 in s1:
        if s_2 in s_1:
            res.append(1)
        else:
            res.append(0)

solution = np.array(res).reshape((2, len(s1))).sum(axis=0)


which results in

array([1, 1, 0, 0])

Upvotes: 1

Views: 32

Answers (1)

Vishnudev Krishnadas
Vishnudev Krishnadas

Reputation: 10960

Use

s1.str.contains('|'.join(s2.str.replace('|', ''))).astype(int).values

Output

array([1, 1, 0, 0])

Upvotes: 1

Related Questions