Catherine Nosova
Catherine Nosova

Reputation: 109

How to build column by column dataframe pandas

I have a dataframe looking like this example

A | B | C
__|___|___
s   s  nan
nan x  x

I would like to create a table of intersections between columns like this

  | A    | B  | C
__|______|____|______
A | True |True| False
__|______|____|______
B | True |True|True
__|______|____|______
C | False|True|True
__|______|____|______

Is there an elegant cycle-free way to do it?

Thank you!

Upvotes: 1

Views: 199

Answers (1)

piRSquared
piRSquared

Reputation: 294308

Setup

df = pd.DataFrame(dict(A=['s', np.nan], B=['s', 'x'], C=[np.nan, 'x']))

Option 1
You can use numpy broadcasting to evaluate each column by each other column. Then determine if any of the comparisons are True

v = df.values

pd.DataFrame(
    (v[:, :, None] == v[:, None]).any(0),
    df.columns, df.columns
)

       A     B      C
A   True  True  False
B   True  True   True
C  False  True   True

By replacing any with sum you can get a count of how many intersections.

v = df.values

pd.DataFrame(
    (v[:, :, None] == v[:, None]).sum(0),
    df.columns, df.columns
)

   A  B  C
A  1  1  0
B  1  2  1
C  0  1  1

Or use np.count_nonzero instead of sum

v = df.values

pd.DataFrame(
    np.count_nonzero(v[:, :, None] == v[:, None], 0),
    df.columns, df.columns
)

   A  B  C
A  1  1  0
B  1  2  1
C  0  1  1

Option 2
Fun & Creative way

d = pd.get_dummies(df.stack()).unstack(fill_value=0)
d = d.T.dot(d)
d.groupby(level=1).sum().groupby(level=1, axis=1).sum()

   A  B  C
A  1  1  0
B  1  2  1
C  0  1  1

Upvotes: 1

Related Questions