Rahel Miz
Rahel Miz

Reputation: 159

Pandas combine two columns into one and exclude NaN values

I have a 5k x 2 column dataframe called "both". I want to create a new 5k x 1 DataFrame or column (doesn't matter) by replacing any NaN value in one column with the value of the adjacent column.

ex:

    Gains  Loss
0    NaN   NaN
1    NaN -0.17
2    NaN -0.13
3    NaN -0.75
4    NaN -0.17
5    NaN -0.99
6   1.06   NaN
7    NaN -1.29
8    NaN -0.42
9   0.14  NaN

so for example, I need to swap the NaNs in the first column in rows 1 through 5 with the values in the same rows, in second column to get a new df of the following form:

    Change  
0     NaN  
1    -0.17 
2    -0.13  
3    -0.75 
4    -0.17  
5    -0.99  
6    1.06  

how do I tell python to do this??

Upvotes: 4

Views: 5629

Answers (3)

MarianD
MarianD

Reputation: 14131

You may fill the NaN values with zeroes and then simply add your columns:

both["Change"] = both["Gains"].fillna(0) + both["Loss"].fillna(0)

Then — if you need it — you may return the resulting zeroes back to NaNs:

both["Change"].replace(0, np.nan, inplace=True)

The result:

    Gains      Loss  Change
0     NaN       NaN     NaN
1     NaN     -0.17   -0.17
2     NaN     -0.13   -0.13
3     NaN     -0.75   -0.75
4     NaN     -0.17   -0.17
5     NaN     -0.99   -0.99
6    1.06       NaN    1.06
7     NaN     -1.29   -1.29
8     NaN     -0.42   -0.42
9    0.14       NaN    0.14

Finally, if you want to get rid of your original columns, you may drop them:

both.drop(columns=["Gains", "Loss"], inplace=True)

Upvotes: 5

Umar.H
Umar.H

Reputation: 23099

IIUC, we can filter for null values and just sum the columns to make your new dataframe.

cols = ['Gains','Loss']

s = df.isnull().cumsum(axis=1).eq(len(df.columns)).any(axis=1)
# add df[cols].isnull() if you only want to measure the price columns for nulls.

df['prices'] = df[cols].loc[~s].sum(axis=1)

df = df.drop(cols,axis=1)

print(df)

   prices
0     NaN
1   -0.17
2   -0.13
3   -0.75
4   -0.17
5   -0.99
6    1.06
7   -1.29
8   -0.42

Upvotes: 0

Dimitris Thomas
Dimitris Thomas

Reputation: 1393

There are many ways to achieve this. One is using the loc property:

import pandas as pd
import numpy as np

df = pd.DataFrame({'Price1': [np.nan,np.nan,np.nan,np.nan,
                              np.nan,np.nan,1.06,np.nan,np.nan],
                   'Price2': [np.nan,-0.17,-0.13,-0.75,-0.17,
                              -0.99,np.nan,-1.29,-0.42]})

df.loc[df['Price1'].isnull(), 'Price1'] = df['Price2']
df = df.loc[:6,'Price1']

print(df)

Output:

    Price1
0     NaN
1   -0.17
2   -0.13
3   -0.75
4   -0.17
5   -0.99
6    1.06

You can see more complex recipes in the Cookbook

Upvotes: 1

Related Questions