Reputation: 451
I have the following data table:
RowID| Col1 | Col2 |
----------------------
1 | apple | cow |
2 | orange | dog |
3 | apple | cat |
4 | cherry | fish |
5 | cherry | ant |
6 | apple | rat |
I want get to this table:
RowID| Col1 | Col2 | newCol
------------------------------
1 | apple | cow | cat
2 | apple | cow | rat
3 | orange | dog | na
4 | apple | cat | cow
5 | apple | cat | rat
6 | cherry | fish | ant
7 | cherry | ant | fish
8 | apple | rat | cow
9 | apple | rat | cat
To help visualize the logic of the above table, it would be essentially the same as the below table but with the list column being split into rows depending on the values present. It's matching the values in col1, so for example rows 1 3 and 6 of the original table have 'apple' in column one. Therefore, the new 'list' column would include all the Col2 values of the matching rows. Then be expanded into a new row for each list element. The second table above is the outcome I want, this third table is just here to help visualize where the values are coming from.
RowID| Col1 | Col2 | newCol
------------------------------
1 | apple | cow | cat,rat (Row 3 & 6 match col1 values)
2 | orange | dog | na (No rows match this col1 value)
3 | apple | cat | cow,rat (Row 1 & 6 match col1 values)
4 | cherry | fish | ant (Row 5 matches col1 values)
5 | cherry | ant | fish (Row 4 matches col1 values)
6 | apple | rat | cow,cat (Row 1 & 3 match col1 values)
Upvotes: 0
Views: 63
Reputation: 83225
Using the data.table package:
library(data.table)
# option 1
setDT(dat)[, .SD[CJ(Col2 = Col2, newCol = Col2, unique = TRUE), on = .(Col2)]
, by = Col1
][order(RowID), .SD[Col2 != newCol | .N == 1], by = RowID]
# option 2
setDT(dat)[, newCol := paste0(Col2, collapse = ","), by = Col1
][, .(newCol = unlist(tstrsplit(newCol, ","))), by = .(RowID, Col1, Col2)
][, .SD[Col2 != newCol | .N == 1], by = RowID]
which gives:
RowID Col1 Col2 newCol 1: 1 apple cow cat 2: 1 apple cow rat 3: 2 orange dog dog 4: 3 apple cat cow 5: 3 apple cat rat 6: 4 cherry fish ant 7: 5 cherry ant fish 8: 6 apple rat cow 9: 6 apple rat cat
A tidyverse equivalent:
library(dplyr)
library(tidyr)
dat %>%
group_by(Col1) %>%
mutate(newCol = paste0(Col2, collapse = ",")) %>%
separate_rows(newCol) %>%
group_by(RowID) %>%
filter(Col2 != newCol | n() == 1)
Upvotes: 2
Reputation: 342
Self join the table on the first column, get rid of rows where NewCol equals Col2. The difficult bit is keeping the rows in the data.table that only occur once.
require(data.table)
require(magrittr)
dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
"cherry", "apple"),
Col2 = c("cow","dog","cat","fish",
"ant","rat"))
# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>%
.[N == 1, Col1]
dt_foo2 = dt_foo %>%
.[., on = "Col1", allow.cartesian = T] %>%
setnames("i.Col2", "NewCol") %>%
.[Col1 %in% single_occ, NewCol := NA] %>%
.[Col2 != NewCol | is.na(NewCol)]
Upvotes: 0