Reputation: 19164
I have created a data frame
data = [['Nan', 10], [4, 'Nan'], ['Nan', 12], ['Nan', 13], [5, 'Nan'], [6, 'Nan'], [7, 'Nan'], ['Nan', 8]]
df = pd.DataFrame(data, columns = ['min', 'max'])
print(df)
my dataset looks like,
min max
Nan 10
4 Max
Nan 12
Nan 13
5 Nan
6 Nan
7 Nan
Nan 8
I want to create a new column which will take one value from min then one value from max. If there are cont. 2 values of min/max (as we can see that 12 and 13 are 2 values) I have to consider only one value (consider only 12 and then move to select min)
In short, new column should have one min value row, then one max value row and so on.
OUTPUT should be
combined
10
4
12
5
8
Upvotes: 0
Views: 193
Reputation: 146
If my assumptions are correct then this should work.
import numpy as np
import pandas as pd
data = [['Nan', 10], [4, 'Nan'], ['Nan', 12], ['Nan', 13], [5, 'Nan'], [6, 'Nan'], [7, 'Nan'], ['Nan', 8]]
df = pd.DataFrame(data, columns = ['min', 'max'])
df['combined'] = np.where(df['min']!='Nan', df['min'], df['max'])
This is the output I get
min max combined
0 Nan 10 10
1 4 Nan 4
2 Nan 12 12
3 Nan 13 13
4 5 Nan 5
5 6 Nan 6
6 7 Nan 7
7 Nan 8 8
Upvotes: 0
Reputation: 23217
You can try to change those values of min
and max
with previous row not NaN
to NaN
using .where()
. Then remove the rows with both min
and max
being NaN
. Then update those NaN
value in min
with the value of max
in each row using .combine_first()
:
df = df.replace('Nan', np.nan)
df['min'] = df['min'].where(df['min'].shift().isna())
df['max'] = df['max'].where(df['max'].shift().isna())
df = df.dropna(how='all')
df['combined'] = df['min'].combine_first(df['max'])
Result:
print(df)
min max combined
0 NaN 10.0 10.0
1 4.0 NaN 4.0
2 NaN 12.0 12.0
4 5.0 NaN 5.0
7 NaN 8.0 8.0
Upvotes: 2
Reputation: 71689
Stack the dataframe to reshape into a multiindex series then reset the level 1 index, then using boolean indexing filter/select only rows where the min
is followed by max
or vice-a-versa
s = df[df != 'Nan'].stack().reset_index(name='combined', level=1)
m = s['level_1'] != s['level_1'].shift()
s[m].drop('level_1', 1)
combined
0 10.0
1 4.0
2 12.0
4 5.0
7 8.0
Upvotes: 1
Reputation: 2025
What you can do is to define the first key for the first value that you want to include, for example, 'max' and then iterate through the DataFrame, and append the values to your data structure while changing the key. At the same time, you will have to check for 'NaN' values since you have a lot of those,
combined = []
key = 'max'
for index, row in df.iterrows():
if not row[key] != row[key]:
combined.append(row[key])
if key == 'max':
key = 'min'
else:
key = 'max'
Here, I have just hardcoded in the first value, but if you do not want to do that you can just check which column in the first row has an actual value that is not 'NaN' and then make that the key.
Note: I have added the data to a list, because I am not sure how you plan to include this as a column when the lengths will be different.
Upvotes: 0