Reputation: 107
I have read a lot about creating an adjacency matrix in R. But I have a slightly odd requirement, I cannot crack the data structure for.
I have a data set that looks like this.
Case | Person | Status |
---|---|---|
ABC01 | 99999 | Plaintiff |
ABC01 | 11111 | Defendant |
ABC02 | 22222 | Plaintiff |
ABC02 | 99999 | Defendant |
ABC03 | 33333 | Plaintiff |
ABC03 | 44444 | Defendant |
ABC04 | 55555 | Plaintiff |
ABC04 | 66666 | Defendant |
ABC05 | 99999 | Plaintiff |
ABC05 | 88888 | Defendant |
ABC06 | 77777 | Plaintiff |
ABC06 | 22222 | Defendant |
ABC07 | 11111 | Plaintiff |
ABC07 | 44444 | Defendant |
ABC08 | 44444 | Plaintiff |
ABC08 | 99999 | Defendant |
Hopefully the columns are self-evident. The output of the adjacency matrix should look something like this where for each case (dropped from the final table) a unique count of each parties involvement in each role should be output. this would allow for a network analysis of plaintiffs and defendants.
Plaintiff | Defendant | Count |
---|---|---|
99999 | 11111 | 1 |
22222 | 99999 | 1 |
33333 | 44444 | 1 |
55555 | 66666 | 1 |
99999 | 88888 | 1 |
77777 | 22222 | 1 |
11111 | 44444 | 1 |
Note that as the plaintiff and defendant switch roles their index is repeated. But if there are multiple lawsuits between the same to parties in the same positions than this should be reflected in the count of incidents.
This is the current solution
df %>%
group_by(Case,Person) %>%
mutate(count = n()) %>%
ungroup() %>%
mutate(row=row_number())%>%
spread(Status,count)
The issue with is solution is that rows are offset.
Crime Reference Number | Person Record URN (ACN) | row | Plantiff | Defendant |
---|---|---|---|---|
ACB01 | 8645499 | 1610 | 1 | NA |
ACB02 | 8620113 | 1456 | NA | 1 |
ACB02 | 8708027 | 1457 | 1 | NA |
ACB03 | 8667531 | 1455 | 1 | NA |
ACB04 | 8650244 | 1458 | 1 | NA |
ACB05 | 8613947 | 1214 | 1 | NA |
ACB06 | 9074764 | 1022 | 1 | NA |
ACB07 | 8949458 | 1459 | 1 | NA |
And help or assistance would be greatly appreciated.
Upvotes: 0
Views: 258
Reputation: 887881
We could use data.table
library(data.table)
dcast(setDT(df), frank(Case, ties.method = 'dense') ~ Status,
value.var = 'Person')[, .(n = .N), .(Plaintiff, Defendant)]
-output
Plaintiff Defendant n
1: 99999 11111 1
2: 22222 99999 1
3: 33333 44444 1
4: 55555 66666 1
5: 99999 88888 1
6: 77777 22222 1
7: 11111 44444 1
8: 44444 99999 1
df <- structure(list(Case = c("ABC01", "ABC01", "ABC02", "ABC02", "ABC03",
"ABC03", "ABC04", "ABC04", "ABC05", "ABC05", "ABC06", "ABC06",
"ABC07", "ABC07", "ABC08", "ABC08"), Person = c(99999L, 11111L,
22222L, 99999L, 33333L, 44444L, 55555L, 66666L, 99999L, 88888L,
77777L, 22222L, 11111L, 44444L, 44444L, 99999L), Status = c("Plaintiff",
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant",
"Plaintiff", "Defendant", "Plaintiff", "Defendant", "Plaintiff",
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant"
)), class = "data.frame", row.names = c(NA, -16L))
Upvotes: 1
Reputation: 389265
We can assign an id column for Case
and get the data in wide format. Then use count
to count how many times the combination of Plaintiff
and Defendant
occur.
library(dplyr)
library(tidyr)
df %>%
mutate(Case = dense_rank(Case)) %>%
pivot_wider(names_from = Status, values_from = Person) %>%
count(Plaintiff, Defendant)
# Plaintiff Defendant n
# <int> <int> <int>
#1 11111 44444 1
#2 22222 99999 1
#3 33333 44444 1
#4 44444 99999 1
#5 55555 66666 1
#6 77777 22222 1
#7 99999 11111 1
#8 99999 88888 1
Upvotes: 1