confused_certainties
confused_certainties

Reputation: 579

Copy a column to multiple columns of a DataFrame with Pandas

I have a DataFrame with multiple columns, a few columns being NaN. The dataframe is quite big having around 5,000 columns. Below is a sample from it:

    GeoCode     ESP     FIN     USA     EZ19    PRT      
1   Geography   Spain   Finland USA     EZ      Portugal
2   31-Mar-15   NaN     NaN     0.26    0.89    NaN
3   30-Jun-15   NaN     NaN     NaN     0.90    NaN
4   30-Sep-15   NaN     NaN     0.31    0.90    NaN
5   31-Dec-15   NaN     NaN     0.41    0.91    NaN

I want to copy the value of column 'EZ19' to all columns where all values for row 2 and below are NaN. I tried the following code and it works:

nan_cols = df.columns[df_macro[2:].isnull().all()].to_list()
for c in nan_cols:
   df.loc[2:,c]= df.loc[2:,'EZ19']

But I was thinking there should be a way to assign value of column 'EZ19' to the target columns without using a loop and am surprised that there didn't seem to be a straight forward way to do this. Other questions here don't seem to handle the exact issue I have and couldn't find a solution that worked for me.

Given the size of my dataframe(and it is expected to grow larger overtime) I really want to avoid using a loop in my final code so any help with this will be greatly appreciated.

Upvotes: 1

Views: 1581

Answers (3)

RJ Adriaansen
RJ Adriaansen

Reputation: 9619

A simple oneliner that replaces all empty values in a row with the value in EZ19:

df = df.apply(lambda row: row.where(pd.notnull(row), row.EZ19), axis=1)

Output:

    GeoCode     ESP     FIN     USA     EZ19    PRT
0   Geography   Spain   Finland USA     EZ      Portugal
1   31-Mar-15   0.89    0.89    0.26    0.89    0.89
2   30-Jun-15   0.90    0.90    0.90    0.90    0.90
3   30-Sep-15   0.90    0.90    0.31    0.90    0.90
4   31-Dec-15   0.91    0.91    0.41    0.91    0.91

Upvotes: 0

Cameron Riddell
Cameron Riddell

Reputation: 13407

If you're interested in replacing values of columns that contain all nulls, you can take a shortcut and simply overwrite all values below row 2 after identifying those values are entirely null.

# Identify columns that contain null values from row 2 onwards
all_null_cols = df.loc[2:].isnull().all()

# overwrite row 2 onwards in only our null columns with values from "EZ19"
df.loc[2:, all_nulls] = df.loc[2:, ["EZ19"]].values

print(df)
     GeoCode    ESP      FIN   USA  EZ19       PRT
1  Geography  Spain  Finland   USA    EZ  Portugal
2  31-Mar-15   0.89     0.89  0.26  0.89      0.89
3  30-Jun-15   0.90     0.90   NaN  0.90      0.90
4  30-Sep-15   0.90     0.90  0.31  0.90      0.90
5  31-Dec-15   0.91     0.91  0.41  0.91      0.91

Upvotes: 2

sammywemmy
sammywemmy

Reputation: 28644

Not sure if this is what you have in mind:

outcome = df.loc[2:, df.loc[2:].isna().all()].mask(
    lambda df: df.isna(), df.loc[2:, "EZ19"], axis=0
)

outcome

    ESP     FIN     PRT
2   0.89    0.89    0.89
3   0.90    0.90    0.90
4   0.90    0.90    0.90
5   0.91    0.91    0.91

df.update(outcome)

df

    GeoCode     ESP     FIN     USA     EZ19    PRT
1   Geography   Spain   Finland USA     EZ  Portugal
2   31-Mar-15   0.89    0.89    0.26    0.89    0.89
3   30-Jun-15   0.90    0.90    NaN     0.90    0.90
4   30-Sep-15   0.90    0.90    0.31    0.90    0.90
5   31-Dec-15   0.91    0.91    0.41    0.91    0.91

It only fills completely null rows from row 2 downwards; USA is not completely null from row 2, that's why it was not altered.

Upvotes: 1

Related Questions