np2020
np2020

Reputation: 140

How to turn pandas series into a matrix?

I have the below series and would like to turn it into a matrix with the Index column split into individual letters.

I am unsure of the easiest way to generate the matrix would anyone be able to help with how to do this?

Series:

A-B    10
A-C    20
A-D    30
B-C    40
B-D    50
C-D    60

Desired:

    A   B   C   D
A   1  10  20  30
B  10   1  40  50
C  20  40   1  60
D  30  50  60   1

I have split out the index of the series into unique values with the following:

set([item for sublist in [i.split('-') for i in df.index.tolist()] for item in sublist])

Sample data:


df = pd.Series([10,20,30,40,50,60],index=["A-B","A-C","A-D","B-C","B-D","C-D"])

pd.DataFrame([[1,10,20,30],[10,1,40,50],[20,40,1,60],[30,50,60,1]],index = ["A","B", "C", "D"], columns = ["A","B", "C", "D"])

Upvotes: 1

Views: 740

Answers (2)

Paulina Khew
Paulina Khew

Reputation: 407

At the top of my solution, I did two imports

import pandas as pd
import string

Instead of putting the indices and values into a series, I put them into lists

index = ["A-B","A-C","A-D","B-C","B-D","C-D"]
values = [10,20,30,40,50,60]

We can initialize two empty arrays in which to store our numeric indices. I mapped the letters to their numeric equivalent starting the index at 0 so we can populate a matrix easily. So A=0, B=1, C=2, etc.

x = []
y = []
for idx in index:
    split = idx.split("-")
    x.append(string.ascii_lowercase.index(split[0].lower()))
    y.append(string.ascii_lowercase.index(split[1].lower()))

To get the other combinations, I just double the values list, and append x and y to y and x, respectively.

values = [10,20,30,40,50,60] *2
x_copy = x
x = x + y
y = y + x_copy

Now, I create my final matrix, which is 4*4 and has the default value of 1. Looping through each value in the values list, I populate the final matrix by assigning values to final[i][j].

final = [[1]*4 for _ in range(4)]
for i in range(len(values)):
    final[x[i]][y[i]] = values[i]

Now, I create a list of column/index names, ['A', 'B', 'C', 'D'] and initialize my DataFrame.

names = [chr(ord('@')+1+x) for x in range(4)]
df = pd.DataFrame(final, columns=names, index=names)

This is my final output df image

Upvotes: 0

Max Pierini
Max Pierini

Reputation: 2249

If you got

sr = pd.Series([10,20,30,40,50,60],index=["A-B","A-C","A-D","B-C","B-D","C-D"])

you can reset the series index to get a DataFrame and rename the columns

df = sr.reset_index()
df.columns = ['row_col', 'data']

that gives

  row_col  data
0     A-B    10
1     A-C    20
2     A-D    30
3     B-C    40
4     B-D    50
5     C-D    60

Now, complete missing combinations

df_inv = pd.DataFrame({
    'row_col': df.row_col.str[::-1],
    'data': df.data
})
df_values = df.append(df_inv)

yielding

  row_col  data
0     A-B    10
1     A-C    20
2     A-D    30
3     B-C    40
4     B-D    50
5     C-D    60
0     B-A    10
1     C-A    20
2     D-A    30
3     C-B    40
4     D-B    50
5     D-C    60

Now you can split the row_col column with expand into two new columns

df_values[['row', 'col']] = df_values.row_col.str.split('-', 1, expand=True)

and we get

  row_col  data row col
0     A-B    10   A   B
1     A-C    20   A   C
2     A-D    30   A   D
3     B-C    40   B   C
4     B-D    50   B   D
5     C-D    60   C   D
0     B-A    10   B   A
1     C-A    20   C   A
2     D-A    30   D   A
3     C-B    40   C   B
4     D-B    50   D   B
5     D-C    60   D   C

Finally, we can pivot to get "matrix" (pivot table)

df_piv = df_values[['row', 'col', 'data']].pivot(
    index='row', columns='col'
).fillna(1).astype(int)

and we get the wanted table

    data            
col    A   B   C   D
row                 
A      1  10  20  30
B     10   1  40  50
C     20  40   1  60
D     30  50  60   1

Upvotes: 1

Related Questions