Jacob Baisley
Jacob Baisley

Reputation: 107

Creating an Adjacency Matrix in R

I have read a lot about creating an adjacency matrix in R. But I have a slightly odd requirement, I cannot crack the data structure for.

I have a data set that looks like this.

Case Person Status
ABC01 99999 Plaintiff
ABC01 11111 Defendant
ABC02 22222 Plaintiff
ABC02 99999 Defendant
ABC03 33333 Plaintiff
ABC03 44444 Defendant
ABC04 55555 Plaintiff
ABC04 66666 Defendant
ABC05 99999 Plaintiff
ABC05 88888 Defendant
ABC06 77777 Plaintiff
ABC06 22222 Defendant
ABC07 11111 Plaintiff
ABC07 44444 Defendant
ABC08 44444 Plaintiff
ABC08 99999 Defendant

Hopefully the columns are self-evident. The output of the adjacency matrix should look something like this where for each case (dropped from the final table) a unique count of each parties involvement in each role should be output. this would allow for a network analysis of plaintiffs and defendants.

Plaintiff Defendant Count
99999 11111 1
22222 99999 1
33333 44444 1
55555 66666 1
99999 88888 1
77777 22222 1
11111 44444 1

Note that as the plaintiff and defendant switch roles their index is repeated. But if there are multiple lawsuits between the same to parties in the same positions than this should be reflected in the count of incidents.

This is the current solution

df %>% 
group_by(Case,Person) %>% 
mutate(count = n()) %>%
ungroup() %>%
mutate(row=row_number())%>%
spread(Status,count)

The issue with is solution is that rows are offset.

Crime Reference Number Person Record URN (ACN) row Plantiff Defendant
ACB01 8645499 1610 1 NA
ACB02 8620113 1456 NA 1
ACB02 8708027 1457 1 NA
ACB03 8667531 1455 1 NA
ACB04 8650244 1458 1 NA
ACB05 8613947 1214 1 NA
ACB06 9074764 1022 1 NA
ACB07 8949458 1459 1 NA

And help or assistance would be greatly appreciated.

Upvotes: 0

Views: 258

Answers (2)

akrun
akrun

Reputation: 887881

We could use data.table

library(data.table)
dcast(setDT(df), frank(Case, ties.method = 'dense') ~ Status,
     value.var = 'Person')[, .(n = .N), .(Plaintiff, Defendant)]

-output

  Plaintiff Defendant n
1:     99999     11111 1
2:     22222     99999 1
3:     33333     44444 1
4:     55555     66666 1
5:     99999     88888 1
6:     77777     22222 1
7:     11111     44444 1
8:     44444     99999 1

data

df <- structure(list(Case = c("ABC01", "ABC01", "ABC02", "ABC02", "ABC03", 
"ABC03", "ABC04", "ABC04", "ABC05", "ABC05", "ABC06", "ABC06", 
"ABC07", "ABC07", "ABC08", "ABC08"), Person = c(99999L, 11111L, 
22222L, 99999L, 33333L, 44444L, 55555L, 66666L, 99999L, 88888L, 
77777L, 22222L, 11111L, 44444L, 44444L, 99999L), Status = c("Plaintiff", 
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant", 
"Plaintiff", "Defendant", "Plaintiff", "Defendant", "Plaintiff", 
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant"
)), class = "data.frame", row.names = c(NA, -16L))

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 389265

We can assign an id column for Case and get the data in wide format. Then use count to count how many times the combination of Plaintiff and Defendant occur.

library(dplyr)
library(tidyr)

df %>%
  mutate(Case = dense_rank(Case)) %>%
  pivot_wider(names_from = Status, values_from = Person) %>%
  count(Plaintiff, Defendant)

#  Plaintiff Defendant     n
#      <int>     <int> <int>
#1     11111     44444     1
#2     22222     99999     1
#3     33333     44444     1
#4     44444     99999     1
#5     55555     66666     1
#6     77777     22222     1
#7     99999     11111     1
#8     99999     88888     1

Upvotes: 1

Related Questions