Sunil Varma
Sunil Varma

Reputation: 39

Alternate way to use Dask loc like in Pandas loc | = operator not working in dask

for col1 in columns_1:
  for col2 in columns_2:
    df.loc[df['any_column_in_df'] == col2, col1] = 0

What I want : I want alternative Code/Way to get this done in dask ! working in pandas. Problem : Can't use assign ( = ) in dask while using df.loc because of inplace is not support ? Explanation : I want to assign 0/value where condition meet and return dataframe ! ( not series ! ) I Tried using mask, map_partitions with df.replace (working fine for this simple 1 column value manipulation and returning dataframe as required)...

def replace(x: pd.DataFrame) -> pd.DataFrame:
  return x.replace(
  {'any_column_to_replace_value': [np.nan]},
  {'any_column_to_replace_value': [0]}
  )
df = df.map_partitions(replace)

How to do for first code ? and return dataframe.

Thanks in advance, Please help me Dask Experts i'm new to dask and exploring it..

Upvotes: 0

Views: 359

Answers (1)

Sunil Varma
Sunil Varma

Reputation: 39

Answer by @martindurant on gitter…

This is a row-wise compute, so you can use apply or map_partitions

def process(df):
  for col1 in columns_1:
    for col2 in columns_2:
      df.loc[df[‘any_column_in_df’] == col2, col1] = 0
  return df

df2 = df.map_partitions(process)

Upvotes: 1

Related Questions