Reputation: 11
Lets say I've four columns
Date | Col_1 | Col_2 | Col_3 |
---|---|---|---|
2022-01-01 | NaN | 1.0 | 2.0 |
2022-02-01 | NaN | NaN | 3.0 |
2022-03-01 | 3.0 | 5.0 | 4.0 |
2022-04-01 | NaN | NaN | 5.0 |
and I want to fill in columns with null values base on the date column where dates less then 2022-03-01 gets fill with a 0.
Result wanted:
Date | Col_1 | Col_2 | Col_3 |
---|---|---|---|
2022-01-01 | 0.0 | 1.0 | 2.0 |
2022-02-01 | 0.0 | 0.0 | 3.0 |
2022-03-01 | 3.0 | 5.0 | 4.0 |
2022-04-01 | NaN | NaN | 5.0 |
My Code:
null_column = [x for x in list(df.columns) if df[x].isnull().sum()>0]
for c in null_column:
df[c] = df.apply(lambda x: 0 if (np.isnan(df[c])) & (df['Date']<'2022-03-01') else x)
but I'm getting "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
Upvotes: 1
Views: 2013
Reputation: 13242
Don't try to iterate over Dataframes, it's basically never the best method, and throws out the point of using a Dataframe. Try to find the appropriate Vectorized pandas
function for whatever it is you're trying to do, it probably exists.
mask = df.Date.lt('2022-03-01')
df.loc[mask] = df.loc[mask].fillna(0)
print(df)
Output:
Date Col_1 Col_2 Col_3
0 2022-01-01 0.0 1.0 2.0
1 2022-02-01 0.0 0.0 3.0
2 2022-03-01 3.0 5.0 4.0
3 2022-04-01 NaN NaN 5.0
To modify a single column:
mask = df.Date.lt('2022-03-01')
df.loc[mask, 'Col_1'] = df.loc[mask, 'Col_1'].fillna(0)
# Output:
Date Col_1 Col_2 Col_3
0 2022-01-01 0.0 1.0 2.0
1 2022-02-01 0.0 NaN 3.0
2 2022-03-01 3.0 5.0 4.0
3 2022-04-01 NaN NaN 5.0
You may consider making Date
a datetime index if they're unique:
df = df.set_index('Date')
# This simplifies indexing with Dates:
print(df.loc[:'2022-03-01'])
# Output:
Col_1 Col_2 Col_3
Date
2022-01-01 0.0 1.0 2.0
2022-02-01 0.0 NaN 3.0
2022-03-01 3.0 5.0 4.0
Upvotes: 1