Puneet Tripathi
Puneet Tripathi

Reputation: 422

Create an if-else condition column in dask dataframe

I need to create a column which is based on some condition on dask dataframe. In pandas it is fairly straightforward:

ddf['TEST_VAR'] = ['THIS' if x == 200607 else  
              'NOT THIS' if x == 200608 else 
              'THAT' if x == 200609 else 'NONE'  
              for x in ddf['shop_week'] ]

While in dask I have to do same thing like below:

def f(x):
    if x == 200607:
         y= 'THIS'
    elif x == 200608 :
         y= 'THAT'
    else :
         y= 1 
    return y

ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
ddf1.compute()

Questions:

  1. Is there a better/more straightforward way to achieve it?
  2. I can't modify the first dataframe ddf, i need to create ddf1 to se the changes is dask dataframe Immutable object?

Upvotes: 12

Views: 4300

Answers (3)

Zelazny7
Zelazny7

Reputation: 40628

A better approach might be pull out the column as a dask array and then perform some nested where operations before adding it back to the dataframe:

import dask.array as da

x = ddf['shop_week'].to_dask_array()

df['TEST_VAR'] = \
    da.where(x == 200607, 'THIS',
    da.where(x == 200608, 'NOT THIS',
    da.where(x == 200609, 'THAT', 'NONE')))

df['TEST_VAR'].compute()

Upvotes: 0

MRocklin
MRocklin

Reputation: 57261

Answers:

  1. What you're doing now is almost ok. You don't need to call compute until you're ready for your final answer.

    # ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))
    ddf1 = ddf.assign(col1 = ddf.shop_week.apply(f))
    

    For some cases dd.Series.where might be a good fit

    ddf1 = ddf.assign(col1 = ddf.shop_week.where(cond=ddf.balance > 0, other=0))
    
  2. As of version 0.10.2 you can now insert columns directly into dask.dataframes

    ddf['col'] = ddf.shop_week.apply(f)
    

Upvotes: 7

Ohumeronen
Ohumeronen

Reputation: 2086

You could just use:

f = lambda x: 'THIS' if x == 200607 else 'NOT THIS' if x == 200608 else 'THAT' if x == 200609 else 'NONE'

And then:

ddf1 = ddf.assign(col1 = list(ddf.shop_week.apply(f).compute()))

Unfortunately I don't have an answer to the second question or I don't understand it...

Upvotes: 1

Related Questions