Reputation: 779
I have a DataFrame with 3 columns such that each can have a value of NaN
.
I'd like to populate a 4th column based on these 3, such that an or
operation is applied on the columns: if the 1st is not NaN
, take its value, else check the 2nd, etc.
Since a NaN
value is not a False
, the or
operator cannot be used as-is.
Here is the code I came with, but it's not very Pythonic or Pandas-ic. Is there a built-in function that does it? or, if you have any other suggestions?
import pandas as pd
import numpy as np
nan = np.NaN
df = pd.DataFrame({"a": [nan, 1, nan], "b": [2, nan, nan], "c": [nan, nan, 3]})
# a b c
# 0 NaN 2.0 NaN
# 1 1.0 NaN NaN
# 2 NaN NaN 3.0
nan_to_false = lambda val: False if pd.isna(val) else val
df["a_or_b_or_c"] = df.apply(lambda row: nan_to_false(row["a"]) or nan_to_false(row["b"]) or nan_to_false(row["c"]), axis=1)
# 0 2.0
# 1 1.0
# 2 3.0
Upvotes: 1
Views: 451
Reputation: 111
It seems to me that only one non-missing value on a line. You can try this trick
import pandas as pd
import numpy as np
nan = np.NaN
df = pd.DataFrame({"a": [nan, 1, nan], "b": [2, nan, nan], "c": [nan, nan, 3]})
fn = lambda x: np.max(x)
df["a_or_b_or_c"] = df[["a", "b", "c"]].apply(fn, axis=1)
#
a b c a_or_b_or_c
0 NaN 2.0 NaN 2.0
1 1.0 NaN NaN 1.0
2 NaN NaN 3.0 3.0
Upvotes: 0
Reputation: 863166
Idea is back filling missing values and then selecting first column:
df["all columns"] = df.bfill(axis=1).iloc[:, 0]
If need filter columns names:
df["a_or_b_or_c"] = df[['a','b','c']].bfill(axis=1).iloc[:, 0]
Upvotes: 3