How to make a rectangular matrix square on pandas dataframe

Question

I have a matrix on the following form (not necessarily square):

   A    B    C    D
A  0   0.2  0.3  0.5
E 0.2  0.6  0.9  0.2
D 0.5  0.3  0.6   0
F 0.1  0.4  0.5  0.3

And I would like to turn it into a square matrix as follows

   A    B    C    D    E    F
A  0   0.2  0.3  0.5  0.2  0.1
B 0.2   0    0   0.3  0.6  0.4
C 0.3   0    0   0.6  0.9  0.5
D 0.5  0.3  0.6   0   0.2  0.3
E 0.2  0.6  0.9  0.2   0    0
F 0.1  0.4  0.5  0.3   0    0

In other words, I would like to expand both rows and columns so that it is a symmetric square matrix (rows and columns are in the same order) and missing values are filled with 0.

I guessed there should be a way to do this easily/efficiently using built in functions of pandas but I am not familiar with the package.

for convenience:

df = pd.DataFrame([[0, 0.2, 0.3, 0.5],
                   [0.2, 0.6, 0.9, 0.2],
                   [0.5, 0.3, 0.6, 0],
                   [0.1, 0.4, 0.5, 0.3]],
                   index=['A', 'E', 'D', 'F'],
                   columns=['A', 'B', 'C', 'D'])

n8yoder · Accepted Answer

Just as you thought you can definitely do this pretty concisely in Pandas.

One way is by using the very nice combine_first method.

result = df.combine_first(df.T).fillna(0.0)

However, in my testing using timeit that clocked in at 3.62 ms ± 29.2 µs per loop which was actually slightly slower than the time I got for your method (3.5 ms ± 28.6 µs per loop).

However, by calculating this more directly in Pandas using the update method I was able to get this down to 2.04 ms ± 17.2 µs per loop µs per loop (~1.7x as fast).

# Find the combination of both indices
full_index = df.index.union(df.columns)
# Resize the DataFrame to include all the rows and columns
all_data = df.reindex(labels=full_index, axis=0).reindex(labels=full_index, axis=1)
# Update any values we have from the transpose 
all_data.update(all_data.T)
# Fill the missing entries
result = all_data.fillna(0.0)

Honestly I wasn't able to get as much of a performance improvement as I thought I might, but both pandas based versions are a little more readable to me at least.

How to make a rectangular matrix square on pandas dataframe

Answers (2)

Related Questions