Matt-pow
Matt-pow

Reputation: 986

Check two series are equal with a condition

I have two series and want to check if they are equal with a condition on the combination between 'a' and 'b' is acceptable

first = pd.Series(['a', 'a', 'b', 'c', 'd'])
second = pd.Series(['A', 'B', 'C', 'C', 'K'])

expected output :

0  True
1  True
2  False
3  True
4  False

So far I know eq can compare the two series but I am not sure how to include the condition

def helper(s1, s2):
    return s1.str.lower().eq(s2.str.lower())

Upvotes: 1

Views: 184

Answers (3)

Paul H
Paul H

Reputation: 68126

You can use bitwise logic operations to include your additional logic.

So that's:

condition_1 = first.str.casefold().eq(second.str.casefold())
condition_2 = first.str.casefold().isin(['a', 'b']) & second.str.casefold().isin(['a', 'b'])
result =  condition_1 | condition_2

Or with numpy:

condition_1 = first.str.casefold().eq(second.str.casefold())
condition_2 = numpy.bitwise_and(
    first.str.casefold().isin(['a', 'b']),
    second.str.casefold().isin(['a', 'b'])
)
result =  numpy.bitwise_or(condition_1, condition_2)

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150735

You can use replace to map all a to b:

def transform(s):
    return s.str.lower().replace({'a':'b'})

transform(first).eq(transform(second))

Upvotes: 1

You can specify an "ascii_distance" as follows:

import pandas as pd

s1 = pd.Series(['a', 'a', 'b', 'c', 'd'])
s2 = pd.Series(['A', 'A', 'b', 'C', 'F'])

def helper(s1, s2, ascii_distance):
    s1_processed = [ord(c1) for c1 in s1.str.lower()]
    s2_processed = [ord(c2) for c2 in s2.str.lower()]

    print(f'ascii_distance = {ascii_distance}')
    print(f's1_processed = {s1_processed}')
    print(f's2_processed = {s2_processed}')

    result = []
    for i in range(len(s1)):
        result.append((abs(s1_processed[i] - s2_processed[i]) <= ascii_distance))

    return result

ascii_distance = 2
print(helper(s1, s2, ascii_distance))

Output:

ascii_distance = 2
s1_processed = [97, 97, 98, 99, 100]
s2_processed = [97, 97, 98, 99, 102]
[True, True, True, True, True]

Upvotes: 0

Related Questions