Robin
Robin

Reputation: 59

How to match multiple corresponding values between columns in R

I have a data frame that is structured as follows

ID  Value1    Value2
1   a;d;g;f   12;14;15;9
2   b;c;e     5;18;20
3   h;i;j     6;7;25

So I have an ID and two values, for value 1 one there are multiple options that correspond to value 2. And I want to end up with the following data frame i.e. every option of value 1 with its corresponding value 2

ID  Value1   Value2
1   a        12
1   d        14
1   g        15
1   f        9
2   b        5
2   c        18
2   e        2
3   h        6
3   i        7
3   j        25

How can I script R to do this?

Upvotes: 2

Views: 113

Answers (3)

Cath
Cath

Reputation: 24074

Or with base r (df1 being borrowed from @akrun):

vals <- lapply(colnames(df1)[-1], function(col){strsplit(df1[, col], ";")})
names(vals) <- colnames(df1)[-1]
v_id <- mapply(rep, df1$ID, lapply(s_v1, length))

data.frame(ID=unlist(v_id), lapply(vals, unlist), stringsAsFactors=F)

#    ID Value1 Value2
#1   1      a     12
#2   1      d     14
#3   1      g     15
#4   1      f      9
#5   2      b      5
#6   2      c     18
#7   2      e     20
#8   3      h      6
#9   3      i      7
#10  3      j     25

Upvotes: 3

David Arenburg
David Arenburg

Reputation: 92300

Another option is to use tstrsplit from the devel version of data.table

library(data.table)
setDT(df)[, lapply(.SD, function(x) unlist(tstrsplit(x, ";", fixed=TRUE))), by = ID]
#     ID Value1 Value2
#  1:  1      a     12
#  2:  1      d     14
#  3:  1      g     15
#  4:  1      f      9
#  5:  2      b      5
#  6:  2      c     18
#  7:  2      e     20
#  8:  3      h      6
#  9:  3      i      7
# 10:  3      j     25

Upvotes: 4

akrun
akrun

Reputation: 887851

You can try cSplit

library(splitstackshape)
cSplit(df1, sep=";", c('Value1', 'Value2'), 'long', type.convert=TRUE)
#    ID Value1 Value2
#1:  1      a     12
#2:  1      d     14
#3:  1      g     15
#4:  1      f      9
#5:  2      b      5
#6:  2      c     18
#7:  2      e     20
#8:  3      h      6
#9:  3      i      7
#10: 3      j     25

data

df1 <- structure(list(ID = 1:3, Value1 = c("a;d;g;f", "b;c;e", "h;i;j"
), Value2 = c("12;14;15;9", "5;18;20", "6;7;25")), .Names = c("ID", 
"Value1", "Value2"), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 4

Related Questions