Santi
Santi

Reputation: 451

How to add expand a data table based on table information

I have the following data table:

RowID| Col1   | Col2 |
----------------------
1    | apple  | cow  |
2    | orange | dog  |
3    | apple  | cat  |
4    | cherry | fish |
5    | cherry | ant  |
6    | apple  | rat  |

I want get to this table:

RowID| Col1   | Col2 | newCol
------------------------------
1    | apple  | cow  | cat
2    | apple  | cow  | rat   
3    | orange | dog  | na        
4    | apple  | cat  | cow
5    | apple  | cat  | rat   
6    | cherry | fish | ant       
7    | cherry | ant  | fish      
8    | apple  | rat  | cow
9    | apple  | rat  | cat   

To help visualize the logic of the above table, it would be essentially the same as the below table but with the list column being split into rows depending on the values present. It's matching the values in col1, so for example rows 1 3 and 6 of the original table have 'apple' in column one. Therefore, the new 'list' column would include all the Col2 values of the matching rows. Then be expanded into a new row for each list element. The second table above is the outcome I want, this third table is just here to help visualize where the values are coming from.

RowID| Col1   | Col2 | newCol
------------------------------
1    | apple  | cow  | cat,rat   (Row 3 & 6 match col1 values)
2    | orange | dog  | na        (No rows match this col1 value)
3    | apple  | cat  | cow,rat   (Row 1 & 6 match col1 values)
4    | cherry | fish | ant       (Row 5 matches col1 values)
5    | cherry | ant  | fish      (Row 4 matches col1 values)
6    | apple  | rat  | cow,cat   (Row 1 & 3 match col1 values)

Upvotes: 0

Views: 63

Answers (2)

Jaap
Jaap

Reputation: 83225

Using the package:

library(data.table)

# option 1
setDT(dat)[, .SD[CJ(Col2 = Col2, newCol = Col2, unique = TRUE), on = .(Col2)]
           , by = Col1
           ][order(RowID), .SD[Col2 != newCol | .N == 1], by = RowID]

# option 2
setDT(dat)[, newCol := paste0(Col2, collapse = ","), by = Col1
           ][, .(newCol = unlist(tstrsplit(newCol, ","))), by = .(RowID, Col1, Col2)
             ][, .SD[Col2 != newCol | .N == 1], by = RowID]

which gives:

   RowID   Col1 Col2 newCol
1:     1  apple  cow    cat
2:     1  apple  cow    rat
3:     2 orange  dog    dog
4:     3  apple  cat    cow
5:     3  apple  cat    rat
6:     4 cherry fish    ant
7:     5 cherry  ant   fish
8:     6  apple  rat    cow
9:     6  apple  rat    cat

A equivalent:

library(dplyr)
library(tidyr)

dat %>% 
  group_by(Col1) %>% 
  mutate(newCol = paste0(Col2, collapse = ",")) %>% 
  separate_rows(newCol) %>% 
  group_by(RowID) %>% 
  filter(Col2 != newCol | n() == 1)

Upvotes: 2

Juergen
Juergen

Reputation: 342

Self join the table on the first column, get rid of rows where NewCol equals Col2. The difficult bit is keeping the rows in the data.table that only occur once.

require(data.table)
require(magrittr)

dt_foo = data.table(Col1 = c("apple", "orange","apple","cherry",
                      "cherry", "apple"),
                    Col2 = c("cow","dog","cat","fish",
                      "ant","rat"))

# required to later set NA values
single_occ = dt_foo[, .N, Col1] %>% 
  .[N == 1, Col1]

dt_foo2 = dt_foo %>% 
  .[., on = "Col1", allow.cartesian = T] %>% 
  setnames("i.Col2", "NewCol") %>% 
  .[Col1 %in% single_occ, NewCol := NA] %>% 
  .[Col2 != NewCol | is.na(NewCol)]

Upvotes: 0

Related Questions