Reputation: 189
New to pandas, so please bear with me.
I have a text processing function I'd like to run on a column in my datafame conditional on the value in another column. I've seen
Depending on whether something is flagged, I want to run a translation function on it.
account article ... translation flag
0 123 text ... 1
1 123 text ... 0
2 123 text ... 1
I tried:
df['translation'] = df[['flag', 'text']].apply(lambda x: translate(['article']) if ['flag'] == 1 else None)
and get this in return:
ValueError: ('The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().', 'occurred at index flag')
Any help or guidance would be appreciated.
Upvotes: 8
Views: 16644
Reputation: 2150
you could use the .loc
method. The advantage is that the code is more readable:
df.loc[df['flag'] == 1, 'translation'] = df['article'].apply(translate)
First you define the rows of interest using a conditional statement (df['flag'] == 1
) and second to column to which the new values should be assigned to ('translation'
)
Edit:
this results in the same output:
article flag account translation
0 text1 1 123 _text1_
1 text2 0 123 NaN
2 text3 1 123 _text3_
If you need to remaining rows to be None
, than initialize the the column with None
:
df['translation'] = None
df.loc[df['flag'] == 1, 'translation'] = df['article'].apply(translate)
If you want to execute translate
only on the flagged rows, than you can use .log
twice:
df.loc[df['flag'] == 1, 'translation'] = df.loc[df['flag'] == 1,'article'].apply(translate)
Source: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html
Upvotes: 1
Reputation: 30971
I used a test DataFrame similar to yours, without translation column:
account article flag
0 123 text1 1
1 123 text2 0
2 123 text3 1
Then I defined a "surrogate" translation function:
def translate(txt):
return '_' + txt + '_'
And to call it conditionally, run:
df['translation'] = df.apply(lambda row:
translate(row.article) if row.flag == 1 else None, axis=1)
The result is:
account article flag translation
0 123 text1 1 _text1_
1 123 text2 0 None
2 123 text3 1 _text3_
What was wrong in your code:
In you want to limit source data to a subset of columns, use existing column names (article instead text) and include all columns used in the applied function.
The lambda function is applied to each row, so you should have passed axis=1 parameter (default axis is 0).
When your function is called, the current row is passed as a parameter (x), but to refer to some column in it you should have used x.column_name notation. E.g. my solution could also be:
df[['article', 'flag']].apply(lambda row:
translate(row.article) if row.flag == 1 else None, axis=1)
Argument like ['article'] is here just a list, containing one word (article). I doubt whether your translation function is able to process a list argument.
A similar remark concerning if ['flag'] .... This is not a reference to a column in the source row.
Upvotes: 11
Reputation: 25239
IIUC, you may try map
and where
df['translation'] = df['article'].map(translate).where(df['flag'].eq(1), None)
Upvotes: 1