piie
piie

Reputation: 645

R Multiple Columns by Delimiter in each Column, Match String to Value

Sorry for the noob question, but after a few days I haven't been able to figure out how to do this. I've been trying to use R. Simply put, I have two columns as follows

A:B:C:D:F | 1.1:2.1:3.1:4.1:6.1
A:B:D:F | 1.2:2.2:4.2:6.2
A:B:C:F | 1.3:2.3:3.3:6.3
B:C:D:F | 2.4:3.4:4.4:6.4

Note the delimiter is ':'. At the end I want to have this:

A   | B | C | D | E | F
1.1 | 2.1   | 3.1 | 4.1 | NA | 6.1
1.2 | 2.2 | NA  | 4.2   | NA    | 6.2
1.3 | 2.3   | 3.3   | NA    | NA    | 6.3
NA  | 2.4   | 3.4   | 4.4   | NA    | 6.4

Why I can't solve it:

I would imagine something similar to the following, replacing "1" for the corresponding column 2, row x value, but I have no idea how to do this. Another approach I experimented with but was stuck at the inserting column 2 values was to create columns with 1 and 0 only based on if a string was present, which was how I came across the code snippet.

df$A <- ifelse(grepl("A", df$PASS, ignore.case = T), "1", "0")

Sorry for the long writeup, but I'm super stuck. I feel this is beyond my beginner level R. Major kudos to anyone that can solve this!

Upvotes: 0

Views: 106

Answers (4)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193657

I'd recommend looking at a combination of cSplit from my "splitstackshape" package along with dcast:

library(splitstackshape)
dcast(cSplit(as.data.table(df1)[, id := 1:nrow(df1)], names(df1), ":", "long"), 
      id ~ col1, value.var = "col2")
   id   A   B   C   D   F
1:  1 1.1 2.1 3.1 4.1 6.1
2:  2 1.2 2.2  NA 4.2 6.2
3:  3 1.3 2.3 3.3  NA 6.3
4:  4  NA 2.4 3.4 4.4 6.4

Upvotes: 1

Onyambu
Onyambu

Reputation: 79318

Using base R and writing in a general format:

using the data

 df1 <- structure(list(col1 = c("A:B:C:D:F", "A:B:D:F", "A:B:C:F", "B:C:D:F"), col2 = c
("1.1:2.1:3.1:4.1:6.1", "1.2:2.2:4.2:6.2", "1.3:2.3:3.3:6.3", "2.4:3.4:4.4:6.4")),
.Names = c("col1", "col2"), class = "data.frame",  row.names = c(NA, -4L))

Then the code will be

a=apply(df1,1,strsplit,":",fixed=TRUE)
b=lapply(a,function(x){y=`names<-`(as.numeric(x[[2]]),x[[1]]);data.frame(t(y))})
d=Reduce(function(x,y) merge(x,y,by =intersect(names(x),names(y)),all.x = TRUE,all.y = TRUE),b)
b1=max(match(names(unlist(b)),LETTERS))
e=LETTERS[1:b1][!LETTERS[1:b1]%in%names(d)]
f=`names<-`(data.frame(d,rep(list(NA),length(e))),c(names(d),e))
g=f[,order(names(f))]
Reduce(rbind,c(paste0(names(g),collapse = " | "),apply(g,1,paste0,collapse="|")))

If the data is

 x <- "A:B:C:D:F | 1.1:2.1:3.1:4.1:6.1
       A:B:D:F | 1.2:2.2:4.2:6.2 
       A:B:C:F | 1.3:2.3:3.3:6.3
       B:C:D:F | 2.4:3.4:4.4:6.4"

Then you just need to transform this with the one line code below and run the above code.

 df1=matrix(unlist(strsplit(unlist(strsplit(x,"\n"))," | ",fixed = T)),ncol=2,byrow=T)

Upvotes: 1

shians
shians

Reputation: 965

Here's a solution in just base R without the magic of tidyverse. It assumes you can read in all your data as a big string but it's not too hard to alter it for an input stream.

x <- "A:B:C:D:F | 1.1:2.1:3.1:4.1:6.1
A:B:D:F | 1.2:2.2:4.2:6.2
A:B:C:F | 1.3:2.3:3.3:6.3
B:C:D:F | 2.4:3.4:4.4:6.4"

data <- unlist(str_split(x, "\n"))
result <- matrix(as.numeric(NA), nrow = length(data), ncol = 6)
colnames(result) <- c("A", "B", "C", "D", "E", "F")

for (i in 1:length(data)) {
    split_data <- unlist(str_split(data[i], " [|] "))
    print(split_data)
    indices <- unlist(str_split(split_data[1], ":"))
    values <- unlist(str_split(split_data[2], ":"))

    for (j in 1:length(indices)) {
        result[i, indices[j]] <- as.numeric(values[j])
    }
}

result

Upvotes: 3

akrun
akrun

Reputation: 887671

We can try with tidyverse

library(tidyverse)
res <- df1 %>%
        map(~strsplit(., ":")) %>% 
        transpose %>% 
        map(~set_names(as.data.frame.list(as.numeric(.[[2]])), .[[1]]))  %>% 
        bind_rows %>%
        right_join(as.data.frame(setNames(rep(list(NA), 6), LETTERS[1:6])), .) 

res
#    A   B   C   D  E   F
#1 1.1 2.1 3.1 4.1 NA 6.1
#2 1.2 2.2  NA 4.2 NA 6.2
#3 1.3 2.3 3.3  NA NA 6.3
#4  NA 2.4 3.4 4.4 NA 6.4

data

df1 <- structure(list(col1 = c("A:B:C:D:F", "A:B:D:F", "A:B:C:F", "B:C:D:F"
 ), col2 = c("1.1:2.1:3.1:4.1:6.1", "1.2:2.2:4.2:6.2", "1.3:2.3:3.3:6.3", 
 "2.4:3.4:4.4:6.4")), .Names = c("col1", "col2"), class = "data.frame", 
 row.names = c(NA, -4L))

Upvotes: 2

Related Questions