Reputation: 35
I have a table of alphanumeric IDs and associated values. There are instances of one ID associated with multiple values. On the other hand, there are also instances of a given value corresponding to several IDs. I am trying to collapse the table in such a way that each ID and each value is only listed once. I found a few questions that are close, but not quite the same, e.g. R finding duplicates in one column and collapsing in a second column, or aggregate values from several fields into one
This is an example of what my data look like:
library(data.table)
ID = c("c067","c155","l413","l413","k456","z025","z026","z025","z026")
value = c(47, 47, 57, 58, "vwd", 85, 85, "ca", "ca")
x.raw = data.table(ID, value)
I can get a partial solution by doing this:
x.tidy1 = x.raw[,list(value = list(value)), by = ID]
But, notice that the values "47" and "c("85", "ca")" each occur twice. What I really would like is to get something similar to this at the end:
ID = list(c("c067","c155"),"l413","k456",c("z025","z026"))
value = list("47",c("57","58"),"vwd",c("85","ca"))
x.tidy2 = data.table(ID, value)
# ID value
# 1: c067,c155 47
# 2: l413 57,58
# 3: k456 vwd
# 4: z025,z026 85,ca
I have tried this, without success:
x.tidy2 = x.tidy1[,list(ID = list(ID)), by = value]
How could I do this?
Upvotes: 2
Views: 211
Reputation: 66819
To find connected components and label edges with them...
library(igraph)
g = graph_from_edgelist(as.matrix(x.raw))
mem = components(g)$membership
x.raw[, grp := mem[ match(ID, names(mem))] ]
Then you can summarize as the OP had wanted:
x.raw[, lapply(.SD, function(x) .(unique(x))), by=grp]
# grp ID value
# 1: 1 c067,c155 47
# 2: 2 l413 57,58
# 3: 3 k456 vwd
# 4: 4 z025,z026 85,ca
List columns like this are a very difficult format for further analysis. If you just want to visualize the components...
V(g)$type = V(g)$name %like% "^[a-z][0-9]{3}$"
plot(g, layout = layout.bipartite(g))
Or, you know...
com = make_clusters(g, clusters(g)$mem)
plot(com, g)
Upvotes: 3
Reputation: 18331
One thing you could do is paste
the values together like this
x.tidy3<-x.raw[,.(ID=paste0(ID,collapse=',')),by='value'][,.(value=paste0(value,collapse=',')),by='ID']
Upvotes: 2