python - pass dataframe column as argument in apply function

Question

I have the following dataframe:

In[1]: df = DataFrame({"A": ['I love cooking','I love rowing'], "B": [['cooking','rowing'],['cooking','rowing']]})

Thus the output that I get is:

In[2]: df
Out[1]: 
            A                  B
0  I love cooking  [cooking, rowing]
1   I love rowing  [cooking, rowing]

I want to create a 'C' column where I count the number of occurrences of elements of 'B' in 'A'.

The function I create is:

def count_keywords(x,y):
    a = 0
    for element in y:
        if element in x:
            a += 1
return a

and then do:

df['A'].apply(count_keywords,args=(df['B'],))

In this case, I am passing the entire pandas dataseries as argument, so the element of the dataseries df['B'] is obviously a list, not a string (which in turn is the element of the list).

So I get:

TypeError: 'in ' requires string as left operand, not list

However, if I adjust the function so that:

def count_keywords(x,y): 
    a = 0
    for element in y:
        for new_element in element:
            if new_element in x:
                a += 1
    return a

and then do:

In[3]: df['A'].apply(count_keywords,args=(df['B'],))

the output is:

Out[2]: 
0    2
1    2

Because the function loops through every element in the pandas series and then through every element in the list.

How can I get the function to just check, per dataframe row, the element of series df['B'] against the element in series df['A'], so the output is:?

Out[2]: 
0    1
1    1

Thanks a lot!

maxymoo · Accepted Answer

Another way you could do this is by using a set intersection to calculate the size. In theory this may be faster then iterating over the elements, since set is sort of designed for this kind of thing:

df['C'] = df.apply(lambda x: len(set(x.B).intersection(set(x.A.split()))), axis = 1)

python - pass dataframe column as argument in apply function

Answers (2)

Related Questions