Reputation: 431
I made a toy data frame with which to play, with boards of directors and directors that sit on those boards. The goal is to create a 2-mode matrix from the data frame that I can read into UCINET for statistical analysis.
df1 = pd.DataFrame({"Director": ["Dir_A", "Dir_B", "Dir_C", "Dir_D", "Dir_E", "Dir_F", "Dir_E", "Dir_F", "Dir_G","Dir_D"],
"Board": ["Board_W","Board_W","Board_W","Board_X","Board_X","Board_Y","Board_Y","Board_Z","Board_W","Board_W"]})
Director Board
0 Dir_A Board_W
1 Dir_B Board_W
2 Dir_C Board_W
3 Dir_D Board_X
4 Dir_E Board_X
5 Dir_F Board_Y
6 Dir_E Board_Y
7 Dir_F Board_Z
8 Dir_G Board_W
9 Dir_D Board_W
What I want out is a 2-mode incidence matrix, like this:
Board_W Board_X Board_Y Board_Z
Dir_A 1 0 0 0
Dir_B 1 0 0 0
Dir_C 1 0 0 0
Dir_D 1 1 0 0
Dir_E 0 1 1 0
Dir_F 0 0 1 1
Dir_G 1 0 0 0
I'm not even sure if such a thing is possible but if someone has an idea that would be great. Or if not, at least convert it into a networkx edgelist.
Upvotes: 1
Views: 144
Reputation: 862771
I believe you need get_dummies
with max
by first level for matrix filled by 0
and 1
only:
df = pd.get_dummies(df1.set_index('Director')['Board']).max(level=0)
print (df)
Board_W Board_X Board_Y Board_Z
Director
Dir_A 1 0 0 0
Dir_B 1 0 0 0
Dir_C 1 0 0 0
Dir_D 1 1 0 0
Dir_E 0 1 1 0
Dir_F 0 0 1 1
Dir_G 1 0 0 0
If use crosstab
it working only if all pairs are unique in input data:
#add first row same like second row - duplicated pair
df1 = pd.DataFrame({"Director": ["Dir_A","Dir_A", "Dir_B", "Dir_C", "Dir_D",
"Dir_E", "Dir_F", "Dir_E", "Dir_F", "Dir_G","Dir_D"],
"Board": ["Board_W", "Board_W","Board_W","Board_W","Board_X","Board_X",
"Board_Y","Board_Y","Board_Z","Board_W","Board_W"]})
df = pd.crosstab(df1['Director'], df1['Board'])
print (df)
Board Board_W Board_X Board_Y Board_Z
Director
Dir_A 2 0 0 0 <- first values is 2 (because crosstab counts)
Dir_B 1 0 0 0
Dir_C 1 0 0 0
Dir_D 1 1 0 0
Dir_E 0 1 1 0
Dir_F 0 0 1 1
Dir_G 1 0 0 0
#for general data create unique pairs
df1 = df1.drop_duplicates(['Director','Board'])
df = pd.crosstab(df1['Director'], df1['Board'])
print (df)
Board Board_W Board_X Board_Y Board_Z
Director
Dir_A 1 0 0 0 <- only 0, 1 values
Dir_B 1 0 0 0
Dir_C 1 0 0 0
Dir_D 1 1 0 0
Dir_E 0 1 1 0
Dir_F 0 0 1 1
Dir_G 1 0 0 0
Upvotes: 1