anakaine
anakaine

Reputation: 1248

Dask: masking a dataframe based on multiple conditions to perform selective calculations

I'm looking to replace values on rows where multiple conditions are met when using dask. The pre-set value with which I'll perform the replacement is present in one column, and if the condition is met, then I'll replace the target value with the pre-set value.

I'd like to stay in dask rather than performing this action with another library if possible because of memory constraints when shifting dataframes around.

At the moment, I'm attempting to use the .mask command.

Where GrassDeadFMC >= 12 and Windspeed <= 10 then make GrassFMCoefficient equal to the value in GFMG12L10. ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask(ddf['GrassDeadFMC'] >= 12 & ddf['WindSpeed'] <= 10)

The error I'm receiving is:

ValueError: Metadata inference failed in `and_`.

Original error is below:
------------------------
TypeError('cannot compare a dtyped [float32] array with a scalar of type [bool]')

A minimum executable script, which gives a slightly different error, but probably suffers from the same issue, I guess.

import dask.dataframe as dd
import pandas as pd
from random import randint
df = pd.DataFrame({'GrassFMCoefficient': [0 for x in range(10)],
                   'GFMG12L10': [randint(1, 50) for x in range(10)],
                   'GrassDeadFMC': [randint(1, 50) for x in range(10)],
                   'WindSpeed': [randint(1, 30) for x in range(10)]})
ddf = dd.from_pandas(df,npartitions=1)
ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask(ddf['GrassDeadFMC'] >= 12 & ddf['WindSpeed'] <= 10)
print(ddf.head(10))

Any help on this would be appreciated.

Upvotes: 0

Views: 1963

Answers (1)

Fariliana Eri
Fariliana Eri

Reputation: 301

do you want result like this??

do you want result like this?

you have to isolate each condition with Bracket '()', ex. (condition1) & (condition2). it makes Boolean compare with Boolean too.

ddf['GrassFMCoefficient'] = ddf['GFMG12L10'].mask((ddf['GrassDeadFMC'] >= 12) & (ddf['WindSpeed'] <= 10))

Upvotes: 1

Related Questions