Reputation: 140
I have the below series and would like to turn it into a matrix with the Index column split into individual letters.
I am unsure of the easiest way to generate the matrix would anyone be able to help with how to do this?
Series:
A-B 10
A-C 20
A-D 30
B-C 40
B-D 50
C-D 60
Desired:
A B C D
A 1 10 20 30
B 10 1 40 50
C 20 40 1 60
D 30 50 60 1
I have split out the index of the series into unique values with the following:
set([item for sublist in [i.split('-') for i in df.index.tolist()] for item in sublist])
Sample data:
df = pd.Series([10,20,30,40,50,60],index=["A-B","A-C","A-D","B-C","B-D","C-D"])
pd.DataFrame([[1,10,20,30],[10,1,40,50],[20,40,1,60],[30,50,60,1]],index = ["A","B", "C", "D"], columns = ["A","B", "C", "D"])
Upvotes: 1
Views: 740
Reputation: 407
At the top of my solution, I did two imports
import pandas as pd
import string
Instead of putting the indices and values into a series, I put them into lists
index = ["A-B","A-C","A-D","B-C","B-D","C-D"]
values = [10,20,30,40,50,60]
We can initialize two empty arrays in which to store our numeric indices. I mapped the letters to their numeric equivalent starting the index at 0
so we can populate a matrix easily. So A=0
, B=1
, C=2
, etc.
x = []
y = []
for idx in index:
split = idx.split("-")
x.append(string.ascii_lowercase.index(split[0].lower()))
y.append(string.ascii_lowercase.index(split[1].lower()))
To get the other combinations, I just double the values
list, and append x
and y
to y
and x
, respectively.
values = [10,20,30,40,50,60] *2
x_copy = x
x = x + y
y = y + x_copy
Now, I create my final
matrix, which is 4*4 and has the default value of 1
. Looping through each value in the values
list, I populate the final
matrix by assigning values to final[i][j]
.
final = [[1]*4 for _ in range(4)]
for i in range(len(values)):
final[x[i]][y[i]] = values[i]
Now, I create a list of column/index names, ['A', 'B', 'C', 'D']
and initialize my DataFrame.
names = [chr(ord('@')+1+x) for x in range(4)]
df = pd.DataFrame(final, columns=names, index=names)
Upvotes: 0
Reputation: 2249
If you got
sr = pd.Series([10,20,30,40,50,60],index=["A-B","A-C","A-D","B-C","B-D","C-D"])
you can reset the series index to get a DataFrame and rename the columns
df = sr.reset_index()
df.columns = ['row_col', 'data']
that gives
row_col data
0 A-B 10
1 A-C 20
2 A-D 30
3 B-C 40
4 B-D 50
5 C-D 60
Now, complete missing combinations
df_inv = pd.DataFrame({
'row_col': df.row_col.str[::-1],
'data': df.data
})
df_values = df.append(df_inv)
yielding
row_col data
0 A-B 10
1 A-C 20
2 A-D 30
3 B-C 40
4 B-D 50
5 C-D 60
0 B-A 10
1 C-A 20
2 D-A 30
3 C-B 40
4 D-B 50
5 D-C 60
Now you can split the row_col
column with expand
into two new columns
df_values[['row', 'col']] = df_values.row_col.str.split('-', 1, expand=True)
and we get
row_col data row col
0 A-B 10 A B
1 A-C 20 A C
2 A-D 30 A D
3 B-C 40 B C
4 B-D 50 B D
5 C-D 60 C D
0 B-A 10 B A
1 C-A 20 C A
2 D-A 30 D A
3 C-B 40 C B
4 D-B 50 D B
5 D-C 60 D C
Finally, we can pivot
to get "matrix" (pivot table)
df_piv = df_values[['row', 'col', 'data']].pivot(
index='row', columns='col'
).fillna(1).astype(int)
and we get the wanted table
data
col A B C D
row
A 1 10 20 30
B 10 1 40 50
C 20 40 1 60
D 30 50 60 1
Upvotes: 1