Reputation: 579
I have a DataFrame with multiple columns, a few columns being NaN. The dataframe is quite big having around 5,000 columns. Below is a sample from it:
GeoCode ESP FIN USA EZ19 PRT
1 Geography Spain Finland USA EZ Portugal
2 31-Mar-15 NaN NaN 0.26 0.89 NaN
3 30-Jun-15 NaN NaN NaN 0.90 NaN
4 30-Sep-15 NaN NaN 0.31 0.90 NaN
5 31-Dec-15 NaN NaN 0.41 0.91 NaN
I want to copy the value of column 'EZ19' to all columns where all values for row 2 and below are NaN. I tried the following code and it works:
nan_cols = df.columns[df_macro[2:].isnull().all()].to_list()
for c in nan_cols:
df.loc[2:,c]= df.loc[2:,'EZ19']
But I was thinking there should be a way to assign value of column 'EZ19' to the target columns without using a loop and am surprised that there didn't seem to be a straight forward way to do this. Other questions here don't seem to handle the exact issue I have and couldn't find a solution that worked for me.
Given the size of my dataframe(and it is expected to grow larger overtime) I really want to avoid using a loop in my final code so any help with this will be greatly appreciated.
Upvotes: 1
Views: 1581
Reputation: 9619
A simple oneliner that replaces all empty values in a row with the value in EZ19:
df = df.apply(lambda row: row.where(pd.notnull(row), row.EZ19), axis=1)
Output:
GeoCode ESP FIN USA EZ19 PRT
0 Geography Spain Finland USA EZ Portugal
1 31-Mar-15 0.89 0.89 0.26 0.89 0.89
2 30-Jun-15 0.90 0.90 0.90 0.90 0.90
3 30-Sep-15 0.90 0.90 0.31 0.90 0.90
4 31-Dec-15 0.91 0.91 0.41 0.91 0.91
Upvotes: 0
Reputation: 13407
If you're interested in replacing values of columns that contain all nulls, you can take a shortcut and simply overwrite all values below row 2 after identifying those values are entirely null.
# Identify columns that contain null values from row 2 onwards
all_null_cols = df.loc[2:].isnull().all()
# overwrite row 2 onwards in only our null columns with values from "EZ19"
df.loc[2:, all_nulls] = df.loc[2:, ["EZ19"]].values
print(df)
GeoCode ESP FIN USA EZ19 PRT
1 Geography Spain Finland USA EZ Portugal
2 31-Mar-15 0.89 0.89 0.26 0.89 0.89
3 30-Jun-15 0.90 0.90 NaN 0.90 0.90
4 30-Sep-15 0.90 0.90 0.31 0.90 0.90
5 31-Dec-15 0.91 0.91 0.41 0.91 0.91
Upvotes: 2
Reputation: 28644
Not sure if this is what you have in mind:
outcome = df.loc[2:, df.loc[2:].isna().all()].mask(
lambda df: df.isna(), df.loc[2:, "EZ19"], axis=0
)
outcome
ESP FIN PRT
2 0.89 0.89 0.89
3 0.90 0.90 0.90
4 0.90 0.90 0.90
5 0.91 0.91 0.91
df.update(outcome)
df
GeoCode ESP FIN USA EZ19 PRT
1 Geography Spain Finland USA EZ Portugal
2 31-Mar-15 0.89 0.89 0.26 0.89 0.89
3 30-Jun-15 0.90 0.90 NaN 0.90 0.90
4 30-Sep-15 0.90 0.90 0.31 0.90 0.90
5 31-Dec-15 0.91 0.91 0.41 0.91 0.91
It only fills completely null rows from row 2 downwards; USA
is not completely null from row 2, that's why it was not altered.
Upvotes: 1