Check if a column of a pandas dataframe contains a substring for each row of a different column?

Question

I have been stuck on what I considered originally to be a simple task for quite some while. Here I will use sample data as the actual problem data is much messier (and confidential). Essentially I have two columns both containing strings. I want to check for each row of column 'substring', if it is a substring of any of the rows of column 'string':

s1 = ['good', 'how', 'hello', 'start']
s2 = ['exit', 'hello you','where are you', 'goodbye']
test = pd.DataFrame({'substring':s1, 'string':s2})
>>> test

    string           substring
0   exit             good
1   hello you        how
2   where are you    hello
3   goodbye          start

Essentially I would like some indicator for each row if column A if it is a substring of anywhere in column B:

>>>test
    string           substring   C
0   exit             good        True
1   hello you        how         False
2   where are you    hello       True
3   goodbye          start       False

I have seemed to tried many things and I have just become lost.

I have tried iterating over the rows:

sub_test = pd.DataFrame(columns=test.columns)

    for index, row in test.iterrows():
        a = row['substring']
        delta = test[test['string'].str.contains(a)]
        if len(delta.index > 1):
            sub_test = pd.concat([sub_test, delta])

Which gets me some of the way and returns:

>>>sub_test

    string      substring
3   goodbye     start
1   hello you   how

I would think there is a way of doing this using lambda but I have not been successful:

test['C'] = test.apply(lambda row: row['substring'] in policies['substring'], axis = 1)

Any help would be appreciated. Thanks

Check if a column of a pandas dataframe contains a substring for each row of a different column?

Answers (1)

Related Questions