Bowen Peng
Bowen Peng

Reputation: 1815

How to call pandas dataframe apply function to return two variables

I want to call pandas dataframe apply() function to return two variables

For examples:

print(word_list)
['abc', 'lmn', ]

def is_related_content(x):
    for y in word_list:
        if y in x:
            return x, y
    return '', ''

print(df.head())
    str1        
    abcdef      
    hijklmn     
    asddada    
    
# call apply() function like this
df['string'], df['substring'] = df['str1'].apply(lambda x: is_related_content(x))

# it should be like this
print(df.head())
    str1        string      substring
    abcdef      abcdef      abc
    hijklmn     hijklmn     lmn
    asddada     None        None               

But I got error messages as follows:

news_df['merge_' + col], news_df[col] = news_df['content'].fillna("").apply(lambda x: is_related_content(x))
ValueError: too many values to unpack (expected 2)

Could anyone help me?
Thanks in advance.

Upvotes: 0

Views: 453

Answers (2)

akuiper
akuiper

Reputation: 214927

You need a tuple of Series for the unpacking syntax to work. But apply method is returning a Series of tuples. You can use .str accessor after apply in order to unpack the result as a tuple:

Updates:

s = df['str1'].apply(lambda x: is_related_content(x))
df['string'], df['substring'] = s.str[0], s.str[1]
df
#      str1   string substring
#0   abcdef   abcdef       abc
#1  hijklmn  hijklmn       lmn
#2  asddada                   

df['string'], df['substring'] = df['str1'].apply(lambda x: is_related_content(x)).str

df
#      str1   string substring
#0   abcdef   abcdef       abc
#1  hijklmn  hijklmn       lmn
#2  asddada                   

Upvotes: 1

ThePyGuy
ThePyGuy

Reputation: 18406

The function is_related_content is returning tuple for each values in the column the function is applied, so trying to assign the value like that won't work, since each rows will have tuple of values. One solution would be to apply pd.Series to each individual tuples, and assign them back to the list of the columns for the dataframe; the idea is to split the tuples to multiple columns (similar to explode which splits the values to multiple rows):

>>> df[['string', 'substring']] = df['str1'].apply(is_related_content).apply(pd.Series)
>>> df
      str1   string substring
0   abcdef   abcdef       abc
1  hijklmn  hijklmn       lmn
2  asddada  

Upvotes: 1

Related Questions