Updating the values of a column in a dask dataframe based on some condition on some other columns

Question

We have a very large CSV file which has been imported as a dask dataframe. I make a small example to explain the question.

import dask.dataframe as dd
df = dd.read_csv("name and path of the file.csv")
df.head()

output

col1 | col2 | col3 | col4 
22   | Nan  | 23   |  56
12   |  54  | 22   |  36
48   | Nan  | 2    |  45
76   | 32   | 13   |  6
23   | Nan  | 43   |  8
67   | 54   | 56   |  64
16   | 32   | 32   |  6
3    | 54   | 64   |  8
67   | NaN  | 23   |  64

I want to replace the value of col4 with col1 if col4 and col2 is not NaN



So the result should be

col1| col2  | col3 | col4 
22  | Nan   | 23   |  56
12  |  54   | 22   |  36
48  | Nan   | 2    |  45
76  | 32    | 13   |  76
23  | Nan   | 43   |  8
67  | 54    | 56   |  67
16  | 32    | 32   |  16
3   | 54    | 64   |  8
67  | NaN   | 23   |  64


I know how to do it on pandas:

condition= df[(df['col4'] < df['col1']) & (pd.notnull(df['col2']))].index

df.loc[condition,'col4'] = df.loc[condition, 'col1'].values

Updating the values of a column in a dask dataframe based on some condition on some other columns

Answers (1)

Related Questions