Reputation: 39
for col1 in columns_1:
for col2 in columns_2:
df.loc[df['any_column_in_df'] == col2, col1] = 0
What I want : I want alternative Code/Way to get this done in dask ! working in pandas. Problem : Can't use assign ( = ) in dask while using df.loc because of inplace is not support ? Explanation : I want to assign 0/value where condition meet and return dataframe ! ( not series ! ) I Tried using mask, map_partitions with df.replace (working fine for this simple 1 column value manipulation and returning dataframe as required)...
def replace(x: pd.DataFrame) -> pd.DataFrame:
return x.replace(
{'any_column_to_replace_value': [np.nan]},
{'any_column_to_replace_value': [0]}
)
df = df.map_partitions(replace)
How to do for first code ? and return dataframe.
Thanks in advance, Please help me Dask Experts i'm new to dask and exploring it..
Upvotes: 0
Views: 359
Reputation: 39
Answer by @martindurant on gitter…
This is a row-wise compute, so you can use apply or map_partitions
def process(df):
for col1 in columns_1:
for col2 in columns_2:
df.loc[df[‘any_column_in_df’] == col2, col1] = 0
return df
df2 = df.map_partitions(process)
Upvotes: 1