Alex
Alex

Reputation: 189

apply function on subset of dataframe rows in column based on value in other column

New to pandas, so please bear with me.

I have a text processing function I'd like to run on a column in my datafame conditional on the value in another column. I've seen

Depending on whether something is flagged, I want to run a translation function on it.

   account  article    ... translation  flag
0    123      text      ...               1
1    123      text      ...               0
2    123      text      ...               1

I tried:

df['translation'] = df[['flag', 'text']].apply(lambda x: translate(['article']) if ['flag'] == 1 else None)

and get this in return:

ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index flag')

Any help or guidance would be appreciated.

Upvotes: 8

Views: 16644

Answers (3)

lux7
lux7

Reputation: 2150

you could use the .loc method. The advantage is that the code is more readable:

df.loc[df['flag'] == 1, 'translation'] = df['article'].apply(translate)

First you define the rows of interest using a conditional statement (df['flag'] == 1) and second to column to which the new values should be assigned to ('translation')

Edit:

this results in the same output:

  article  flag  account translation
0   text1     1      123     _text1_
1   text2     0      123         NaN
2   text3     1      123     _text3_

If you need to remaining rows to be None, than initialize the the column with None:

df['translation'] = None
df.loc[df['flag'] == 1, 'translation'] = df['article'].apply(translate)

If you want to execute translate only on the flagged rows, than you can use .log twice:

df.loc[df['flag'] == 1, 'translation'] = df.loc[df['flag'] == 1,'article'].apply(translate)

Source: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html

Upvotes: 1

Valdi_Bo
Valdi_Bo

Reputation: 30971

I used a test DataFrame similar to yours, without translation column:

   account article  flag
0      123   text1     1
1      123   text2     0
2      123   text3     1

Then I defined a "surrogate" translation function:

def translate(txt):
    return '_' + txt + '_'

And to call it conditionally, run:

df['translation'] = df.apply(lambda row:
    translate(row.article) if row.flag == 1 else None, axis=1)

The result is:

   account article  flag translation
0      123   text1     1     _text1_
1      123   text2     0        None
2      123   text3     1     _text3_

What was wrong in your code:

  1. In you want to limit source data to a subset of columns, use existing column names (article instead text) and include all columns used in the applied function.

  2. The lambda function is applied to each row, so you should have passed axis=1 parameter (default axis is 0).

  3. When your function is called, the current row is passed as a parameter (x), but to refer to some column in it you should have used x.column_name notation. E.g. my solution could also be:

     df[['article', 'flag']].apply(lambda row:
         translate(row.article) if row.flag == 1 else None, axis=1)
    
  4. Argument like ['article'] is here just a list, containing one word (article). I doubt whether your translation function is able to process a list argument.

  5. A similar remark concerning if ['flag'] .... This is not a reference to a column in the source row.

Upvotes: 11

Andy L.
Andy L.

Reputation: 25239

IIUC, you may try map and where

df['translation'] = df['article'].map(translate).where(df['flag'].eq(1), None)

Upvotes: 1

Related Questions