sam
sam

Reputation: 19164

combine 2 columns of dataframe based on a condition

I have created a data frame

data = [['Nan', 10], [4, 'Nan'], ['Nan', 12], ['Nan', 13], [5, 'Nan'], [6, 'Nan'], [7, 'Nan'], ['Nan', 8]]
df = pd.DataFrame(data, columns = ['min', 'max'])
print(df)

my dataset looks like,

min    max
Nan    10
4      Max
Nan    12
Nan    13
5      Nan 
6      Nan
7      Nan
Nan    8

I want to create a new column which will take one value from min then one value from max. If there are cont. 2 values of min/max (as we can see that 12 and 13 are 2 values) I have to consider only one value (consider only 12 and then move to select min)

In short, new column should have one min value row, then one max value row and so on.

OUTPUT should be

combined
10
4
12
5
8

Upvotes: 0

Views: 193

Answers (4)

SAK
SAK

Reputation: 146

If my assumptions are correct then this should work.

  1. The value is 'Nan' string and not np.NaN
  2. If the min column has 'Nan' value then max column will have number and vice versa, it means no row can have two numbers.
import numpy as np
import pandas as pd


data = [['Nan', 10], [4, 'Nan'], ['Nan', 12], ['Nan', 13], [5, 'Nan'], [6, 'Nan'], [7, 'Nan'], ['Nan', 8]]
df = pd.DataFrame(data, columns = ['min', 'max'])


df['combined'] = np.where(df['min']!='Nan', df['min'], df['max'])

This is the output I get

   min  max combined
0  Nan   10       10
1    4  Nan        4
2  Nan   12       12
3  Nan   13       13
4    5  Nan        5
5    6  Nan        6
6    7  Nan        7
7  Nan    8        8

Upvotes: 0

SeaBean
SeaBean

Reputation: 23217

You can try to change those values of min and max with previous row not NaN to NaN using .where(). Then remove the rows with both min and max being NaN. Then update those NaN value in min with the value of max in each row using .combine_first():

df = df.replace('Nan', np.nan)

df['min'] = df['min'].where(df['min'].shift().isna())
df['max'] = df['max'].where(df['max'].shift().isna())
df = df.dropna(how='all')
df['combined'] = df['min'].combine_first(df['max'])

Result:

print(df)


   min   max  combined
0  NaN  10.0      10.0
1  4.0   NaN       4.0
2  NaN  12.0      12.0
4  5.0   NaN       5.0
7  NaN   8.0       8.0

Upvotes: 2

Shubham Sharma
Shubham Sharma

Reputation: 71689

Stack the dataframe to reshape into a multiindex series then reset the level 1 index, then using boolean indexing filter/select only rows where the min is followed by max or vice-a-versa

s = df[df != 'Nan'].stack().reset_index(name='combined', level=1)
m = s['level_1'] != s['level_1'].shift()
s[m].drop('level_1', 1)

   combined
0      10.0
1       4.0
2      12.0
4       5.0
7       8.0

Upvotes: 1

Minura Punchihewa
Minura Punchihewa

Reputation: 2025

What you can do is to define the first key for the first value that you want to include, for example, 'max' and then iterate through the DataFrame, and append the values to your data structure while changing the key. At the same time, you will have to check for 'NaN' values since you have a lot of those,

combined = []
key = 'max'
for index, row in df.iterrows():
  if not row[key] != row[key]:
    combined.append(row[key])
    if key == 'max':
      key = 'min'
    else:
      key = 'max'

Here, I have just hardcoded in the first value, but if you do not want to do that you can just check which column in the first row has an actual value that is not 'NaN' and then make that the key.

Note: I have added the data to a list, because I am not sure how you plan to include this as a column when the lengths will be different.

Upvotes: 0

Related Questions