Pythonically create adjacency matrix from spreadsheet

Question

I have a spreadsheet with lists of names of people that a particular person reported working with on a number of projects. If I import it to pandas as a dataframe it will look like this:

       1                  2
Jane   ['Fred', 'Joe']    ['Joe', 'Fred', 'Bob']
Fred   ['Alex']           ['Jane']
Terry  NaN                ['Bob']
Bob    ['Joe']            ['Jane', 'Terry']
Alex   ['Fred']           NaN
Joe    ['Jane']           ['Jane']

I want to create an adjacency matrix that will look like this:

      Jane  Fred  Terry  Bob  Alex  Joe
Jane  0     2     0      1    0     2
Fred  1     0     0      0    1     0
Terry 0     0     0      1    0     0
Bob   1     0     1      0    0     1
Alex  0     1     0      0    0     0
Joe   2     0     0      0    0     0

This matrix, generally, will NOT be symmetric because of inconsistency with people's reports. I have been creating the adjacency matrix just by looping through the dataframe and incrementing the the matrix elements accordingly. Apparently, looping through dataframes is NOT recommended and inefficient, so does anyone have a suggestion on how his could be done more pythonically?

Ibrahim Sherif · Accepted Answer

This is the sample of the data I used to work with.

df = pd.DataFrame({
    'Name': ['Jane', 'Fred', 'Terry', 'Bob', 'Alex', 'Joe'],
    '1':[['Fred', 'Joe'], ['Alex'], np.nan,['Joe'], ['Fred'], ['Jane']],
    '2': [['Joe', 'Fred', 'Bob'], ['Jane'], ['Bob'], ['Jane', 'Terry'], np.nan, ['Jane']]
})

df.head()

    Name            1                 2
0   Jane  [Fred, Joe]  [Joe, Fred, Bob]
1   Fred       [Alex]            [Jane]
2  Terry          NaN             [Bob]
3    Bob        [Joe]     [Jane, Terry]
4   Alex       [Fred]               NaN

I created the adjacency matrix using pandas in three simple steps.

First, I melted the data to have one column only for all the connections between the different names and dropped the variable column.

dff = df.melt(id_vars=['Name']).drop('variable', axis=1)

     Name             value
0    Jane       [Fred, Joe]
1    Fred            [Alex]
2   Terry               NaN
3     Bob             [Joe]
4    Alex            [Fred]
5     Joe            [Jane]
6    Jane  [Joe, Fred, Bob]
7    Fred            [Jane]
8   Terry             [Bob]
9     Bob     [Jane, Terry]
10   Alex               NaN
11    Joe            [Jane]

Secondly, I used the explode method to break down the rows with lists in separate rows.

dff = dff.explode('value')

     Name  value
0    Jane   Fred
0    Jane    Joe
1    Fred   Alex
2   Terry    NaN
3     Bob    Joe
4    Alex   Fred
5     Joe   Jane
6    Jane    Joe
6    Jane   Fred
6    Jane    Bob
7    Fred   Jane
8   Terry    Bob
9     Bob   Jane
9     Bob  Terry
10   Alex    NaN
11    Joe   Jane

Finally, to create the adjacency matrix I used crosstab within pandas which counts the occurrences in the two columns specified only.

pd.crosstab(dff['Name'], dff['value'])

value  Alex  Bob  Fred  Jane  Joe  Terry
Name                                    
Alex      0    0     1     0    0      0
Bob       0    0     0     1    1      1
Fred      1    0     0     1    0      0
Jane      0    1     2     0    2      0
Joe       0    0     0     2    0      0
Terry     0    1     0     0    0      0

Pythonically create adjacency matrix from spreadsheet

Answers (2)

Related Questions