Gino
Gino

Reputation: 779

Pandas: `or` operation on NaN values

I have a DataFrame with 3 columns such that each can have a value of NaN. I'd like to populate a 4th column based on these 3, such that an or operation is applied on the columns: if the 1st is not NaN, take its value, else check the 2nd, etc. Since a NaN value is not a False, the or operator cannot be used as-is. Here is the code I came with, but it's not very Pythonic or Pandas-ic. Is there a built-in function that does it? or, if you have any other suggestions?

import pandas as pd
import numpy as np

nan = np.NaN
df = pd.DataFrame({"a": [nan, 1, nan], "b": [2, nan, nan], "c": [nan, nan, 3]})
#   a   b   c
# 0 NaN 2.0 NaN
# 1 1.0 NaN NaN
# 2 NaN NaN 3.0

nan_to_false = lambda val: False if pd.isna(val) else val

df["a_or_b_or_c"] = df.apply(lambda row: nan_to_false(row["a"]) or nan_to_false(row["b"]) or nan_to_false(row["c"]), axis=1)
# 0    2.0
# 1    1.0
# 2    3.0

Upvotes: 1

Views: 451

Answers (2)

Joseph Konka
Joseph Konka

Reputation: 111

It seems to me that only one non-missing value on a line. You can try this trick

import pandas as pd
import numpy as np

nan = np.NaN
df = pd.DataFrame({"a": [nan, 1, nan], "b": [2, nan, nan], "c": [nan, nan, 3]})

fn = lambda x: np.max(x)
df["a_or_b_or_c"] = df[["a", "b", "c"]].apply(fn, axis=1)

#
     a    b    c  a_or_b_or_c
0  NaN  2.0  NaN          2.0
1  1.0  NaN  NaN          1.0
2  NaN  NaN  3.0          3.0

Upvotes: 0

jezrael
jezrael

Reputation: 863166

Idea is back filling missing values and then selecting first column:

df["all columns"] = df.bfill(axis=1).iloc[:, 0]

If need filter columns names:

df["a_or_b_or_c"] = df[['a','b','c']].bfill(axis=1).iloc[:, 0]

Upvotes: 3

Related Questions