Reputation: 121
I have the following data frame:
Company Firm
125911 1
125911 2
32679 3
32679 5
32679 5
32679 8
32679 10
32679 12
43805 14
67734 8
67734 9
67734 10
67734 10
67734 11
67734 12
67734 13
74240 4
74240 6
74240 7
Where basically the firm makes an investment into the company at a specific year which in this case is the same year for all companies. What I want to do in python is to create a simple adjacency matrix with only 0's and 1's. 1 if two firms has made an investment into the same company. So even if firm 10 and 8 for example have invested in two different firms at the same it will still be a 1. The resulting matrix I am looking for looks like:
Firm 1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 1 0 1 0 1 0 0
4 0 0 0 0 0 1 1 0 0 0 0 0 0 0
5 0 0 1 0 0 0 0 1 0 1 0 1 0 0
6 0 0 0 1 0 0 1 0 0 0 0 0 0 0
7 0 0 0 1 0 1 0 0 0 0 0 0 0 0
8 0 0 1 0 1 0 0 0 1 1 1 1 1 0
9 0 0 0 0 0 0 0 1 0 1 1 1 1 0
10 0 0 1 0 1 0 0 1 1 0 1 1 1 0
11 0 0 0 0 0 0 0 1 1 1 0 1 1 0
12 0 0 1 0 1 0 0 1 1 1 1 0 1 0
13 0 0 0 0 0 0 0 1 1 1 1 1 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0
I have seen similar questions where you can use crosstab
, however in that case each company will only have one row with all the firms in different columns instead. So I am wondering what the best and most efficient way to tackle this specific problem is? Any help is greatly appreciated.
Upvotes: 3
Views: 754
Reputation: 195448
dfs = []
for s in df.groupby("Company").agg(list).values:
dfs.append(pd.DataFrame(index=set(s[0]), columns=set(s[0])).fillna(1))
out = pd.concat(dfs).groupby(level=0).sum().gt(0).astype(int)
np.fill_diagonal(out.values, 0)
print(out)
Prints:
1 2 3 4 5 6 7 8 9 10 11 12 13 14
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
2 1 0 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 0 0 1 0 0 1 0 1 0 1 0 0
4 0 0 0 0 0 1 1 0 0 0 0 0 0 0
5 0 0 1 0 0 0 0 1 0 1 0 1 0 0
6 0 0 0 1 0 0 1 0 0 0 0 0 0 0
7 0 0 0 1 0 1 0 0 0 0 0 0 0 0
8 0 0 1 0 1 0 0 0 1 1 1 1 1 0
9 0 0 0 0 0 0 0 1 0 1 1 1 1 0
10 0 0 1 0 1 0 0 1 1 0 1 1 1 0
11 0 0 0 0 0 0 0 1 1 1 0 1 1 0
12 0 0 1 0 1 0 0 1 1 1 1 0 1 0
13 0 0 0 0 0 0 0 1 1 1 1 1 0 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Upvotes: 2
Reputation: 120429
dfm = df.merge(df, on="Company").query("Firm_x != Firm_y")
out = pd.crosstab(dfm['Firm_x'], dfm['Firm_y'])
>>> out
Firm_y 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Firm_x
1 1 0 0 0 0 0 0 0 0 0 0 0 0 0
2 0 1 0 0 0 0 0 0 0 0 0 0 0 0
3 0 0 1 0 0 0 0 0 0 0 0 0 0 0
4 0 0 0 1 0 0 0 0 0 0 0 0 0 0
5 0 0 0 0 4 0 0 0 0 0 0 0 0 0
6 0 0 0 0 0 1 0 0 0 0 0 0 0 0
7 0 0 0 0 0 0 1 0 0 0 0 0 0 0
8 0 0 0 0 0 0 0 2 0 0 0 0 0 0
9 0 0 0 0 0 0 0 0 1 0 0 0 0 0
10 0 0 0 0 0 0 0 0 0 5 0 0 0 0
11 0 0 0 0 0 0 0 0 0 0 1 0 0 0
12 0 0 0 0 0 0 0 0 0 0 0 2 0 0
13 0 0 0 0 0 0 0 0 0 0 0 0 1 0
14 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Upvotes: 0