Jabernet
Jabernet

Reputation: 431

convert dataframe into 2-mode network matrix

I made a toy data frame with which to play, with boards of directors and directors that sit on those boards. The goal is to create a 2-mode matrix from the data frame that I can read into UCINET for statistical analysis.

df1 = pd.DataFrame({"Director": ["Dir_A", "Dir_B", "Dir_C", "Dir_D", "Dir_E", "Dir_F", "Dir_E", "Dir_F", "Dir_G","Dir_D"], 
                  "Board": ["Board_W","Board_W","Board_W","Board_X","Board_X","Board_Y","Board_Y","Board_Z","Board_W","Board_W"]})


    Director    Board
0   Dir_A   Board_W
1   Dir_B   Board_W
2   Dir_C   Board_W
3   Dir_D   Board_X
4   Dir_E   Board_X
5   Dir_F   Board_Y
6   Dir_E   Board_Y
7   Dir_F   Board_Z
8   Dir_G   Board_W
9   Dir_D   Board_W

What I want out is a 2-mode incidence matrix, like this:

        Board_W  Board_X  Board_Y  Board_Z
Dir_A         1        0        0        0
Dir_B         1        0        0        0
Dir_C         1        0        0        0
Dir_D         1        1        0        0
Dir_E         0        1        1        0
Dir_F         0        0        1        1
Dir_G         1        0        0        0

I'm not even sure if such a thing is possible but if someone has an idea that would be great. Or if not, at least convert it into a networkx edgelist.

Upvotes: 1

Views: 144

Answers (1)

jezrael
jezrael

Reputation: 862771

I believe you need get_dummies with max by first level for matrix filled by 0 and 1 only:

df = pd.get_dummies(df1.set_index('Director')['Board']).max(level=0)
print (df)
          Board_W  Board_X  Board_Y  Board_Z
Director                                    
Dir_A           1        0        0        0
Dir_B           1        0        0        0
Dir_C           1        0        0        0
Dir_D           1        1        0        0
Dir_E           0        1        1        0
Dir_F           0        0        1        1
Dir_G           1        0        0        0

If use crosstab it working only if all pairs are unique in input data:

 #add first row same like second row - duplicated pair
df1 = pd.DataFrame({"Director": ["Dir_A","Dir_A", "Dir_B", "Dir_C", "Dir_D", 
                                  "Dir_E", "Dir_F", "Dir_E", "Dir_F", "Dir_G","Dir_D"], 
                  "Board": ["Board_W", "Board_W","Board_W","Board_W","Board_X","Board_X",
                            "Board_Y","Board_Y","Board_Z","Board_W","Board_W"]})

df = pd.crosstab(df1['Director'], df1['Board'])
print (df)
Board     Board_W  Board_X  Board_Y  Board_Z
Director                                    
Dir_A           2        0        0        0 <- first values is 2 (because crosstab counts)
Dir_B           1        0        0        0
Dir_C           1        0        0        0
Dir_D           1        1        0        0
Dir_E           0        1        1        0
Dir_F           0        0        1        1
Dir_G           1        0        0        0

#for general data create unique pairs
df1 = df1.drop_duplicates(['Director','Board'])
df = pd.crosstab(df1['Director'], df1['Board'])
print (df)
Board     Board_W  Board_X  Board_Y  Board_Z
Director                                    
Dir_A           1        0        0        0 <- only 0, 1 values
Dir_B           1        0        0        0
Dir_C           1        0        0        0
Dir_D           1        1        0        0
Dir_E           0        1        1        0
Dir_F           0        0        1        1
Dir_G           1        0        0        0

Upvotes: 1

Related Questions