user14073111
user14073111

Reputation: 621

Create a new row in a dataframe based only for non NaN values from a column

Lets say i have i dataframe like this one:

      col1       col2         col3
0     data1   Completed       Fail
1     data2   Completed       NaN
2     data3   Completed    Completed
3     data4   Completed       NaN
4     data5      NaN          NaN

How can i add an extra row for each time the value in col3 is not NaN and have a dataframe like this:

      col1     status           
0     data1   Completed 
1     data1      Fail
2     data2   Completed     
3     data3   Completed    
4     data3   Completed
5     data4   Completed      
6     data5      NaN        

I tried this but im not getting the desirable output:

df  = df.melt(id_vars=['col1'],  
        value_name="status")

Upvotes: 1

Views: 392

Answers (1)

sophocles
sophocles

Reputation: 13831

IIUC, you can first start by using pd.melt() as you already did but also drop all the null values by chaining dropna(). This will get you close, but not exactly where you want to be:

new = df.melt(id_vars='col1',value_name='status').sort_values(by='col1').dropna().drop('variable',axis=1)

>>> print(new)

    col1     status
0  data1  Completed
5  data1       Fail
1  data2  Completed
2  data3  Completed
7  data3  Completed
3  data4  Completed

At this point, you will need to bring over the rows from your original df that were nan in col2. You can do that using isnull() and pd.concat() respectively:

col2_nan = df.loc[df.col2.isnull()].drop('col3',axis=1).rename(columns = {'col2':'status'})

>>> print(pd.concat([new,col2_nan]).reset_index(drop=True))


    col1     status
0  data1  Completed
1  data1       Fail
2  data2  Completed
3  data3  Completed
4  data3  Completed
5  data4  Completed
6  data5        NaN

Upvotes: 1

Related Questions