Reputation:
I have two pandas data frames A
and B
, indexed with dates:
>>> A
a b c
Timestamp
2018-02-19 True False False
2018-02-20 False True False
2018-02-21 False False True
and
>>> B
a b d
Timestamp
2018-02-19 False True True
2018-02-20 False False False
2018-02-21 True True True
I want to merge these two data frames such that the merged data frame is a logical or
between each common entry (index, column), and also including the columns that are unique to each data frame. In this case, the output would be:
>>> C
a b c d
Timestamp
2018-02-19 True True False True
2018-02-20 False True False False
2018-02-21 True True True True
Is there a way to do this in pandas?
Upvotes: 2
Views: 1256
Reputation: 641
There's probably a more elegant and generalizable solution, but this will work for the simple example you've given.
A = pd.DataFrame({"a":[True, False, False],
'b':[False, True, False],
'c': [False, False, True]},
index=['a','b','c'])
B = pd.DataFrame({"a":[False, False, True],
'b':[True, False, True],
'd': [True, False, True]},
index=['a','b','c'])
C = pd.concat([(A | B)[['a', 'b']], A['c'], B['d']], axis=1)
print C
a b c d
a True True False True
b False True False False
c True True True True
This ORs the two frames, which will produce the correct result for the columns in common (a, b), but Nan for columns c, d. So, we just slice off columns a and b, then concatenate with c and d, since they remain unchanged by the OR operation.
EDIT: Per your comment, here is more generalized solution, which will save you from having to know and/or hardcode the specific column names.
# Get all column names
all_columns = A.columns | B.columns
# Get column names in common
union = A.columns & B.columns
# Get disjoint column names
not_B = list(set(all_columns) - set(B.columns))
not_A = list(set(all_columns) - set(A.columns))
# Logical-or common columns, and concatenate disjoint columns
C = pd.concat([A[union] | B[union], A[not_B], B[not_A]], axis=1)
# If columns names get disordered because of set operations, use
# `all_columns` to reorder
print(C[all_columns])
a b c d
a True True False True
b False True False False
c True True True True
EDIT 2: Per kmundnic's final solution, here is an updated version that works on more that two data frames.
# For Python 3
from functools import reduce
# A third data frame
C = pd.DataFrame({'a':[False, False, False],
'b':[True, True, False],
'e': [True, True, True]},
index=['a','b','c'])
def logical_merge(A, B):
# Get all column names
all_columns = A.columns | B.columns
# Get column names in common
common = A.columns & B.columns
# Get disjoint column names
_A = [x for x in B.columns if not x in common]
_B = [x for x in A.columns if not x in common]
# Logical-or common columns, and concatenate disjoint columns
return pd.concat([(A | B)[common], A[_B], B[_A]], axis=1)[all_columns]
frames = [A, B, C]
print(reduce(logical_merge, frames))
a b c d e
a True True False True True
b False True False False True
c True True True True True
Upvotes: 2