kaexch
kaexch

Reputation: 15

Split into mutiple columns by a certain format

I would like to split a certain format of data from one column into multiple columns. Below are my sample data :

df = data.frame(id=c(1,2),data=c('^apple:1%2%3^orange:3%4%5',
                                   '^apple:4%5%6^orange:7%8%9'))
#    id    data
#    1    ^apple:1%2%3^orange:3%4%5
#    2    ^apple:4%5%6^orange:7%8%9

which will then gives the following output

 id   data_apple_A  data_apple_B  data_apple_C  data_orange_A  data_orange_B  data_orange_C
 1       1               2              3             3             4              5
 2       4               5              6             7             8              9

I am using the following code to try and get the values for A,B and C(they could be decimal values):

cSplit(df, "data", sep="\\^", fixed = FALSE,direction= "long")[, c('valA','valB','valC')
        :=(str_extract(data, "\\d+\\.*\\d*")),str_extract(data, "(?<=%)\\d+\\.*\\d*"),
        str_extract(data, "(?<=%)\\d+\\.*\\d*$") ][]

but was getting the following error:

Error in `[.data.table`(cSplit(df, "data", sep = "\\^", fixed = FALSE,  : 
  Provide either 'by' or 'keyby' but not both

Do note that apple and orange is just a placeholder. it could be characters and the number of ^(str): could also be different for each row. Also, A B C are fixed. for every ^(str): , there will only be 3 decimal or non decimal numbers in the format of 1%2%3.

Any help would be appreciated.

Upvotes: 0

Views: 221

Answers (1)

jlesuffleur
jlesuffleur

Reputation: 1273

Here is a solution using tidyr::extract:

library(tidyr)
df %>% 
  extract(data, 
          into = paste(rep("data", 6),
                       c("apple", "orange"),
                       rep(c("A", "B", "C"), 2), sep = "_"), 
          regex = "\\^apple:([0-9])\\%([0-9])\\%([0-9])\\^orange:([0-9])\\%([0-9])\\%([0-9])")

Upvotes: 1

Related Questions