Reputation: 15
I would like to split a certain format of data from one column into multiple columns. Below are my sample data :
df = data.frame(id=c(1,2),data=c('^apple:1%2%3^orange:3%4%5',
'^apple:4%5%6^orange:7%8%9'))
# id data
# 1 ^apple:1%2%3^orange:3%4%5
# 2 ^apple:4%5%6^orange:7%8%9
which will then gives the following output
id data_apple_A data_apple_B data_apple_C data_orange_A data_orange_B data_orange_C
1 1 2 3 3 4 5
2 4 5 6 7 8 9
I am using the following code to try and get the values for A,B and C(they could be decimal values):
cSplit(df, "data", sep="\\^", fixed = FALSE,direction= "long")[, c('valA','valB','valC')
:=(str_extract(data, "\\d+\\.*\\d*")),str_extract(data, "(?<=%)\\d+\\.*\\d*"),
str_extract(data, "(?<=%)\\d+\\.*\\d*$") ][]
but was getting the following error:
Error in `[.data.table`(cSplit(df, "data", sep = "\\^", fixed = FALSE, :
Provide either 'by' or 'keyby' but not both
Do note that apple and orange is just a placeholder. it could be characters and the number of ^(str): could also be different for each row. Also, A B C are fixed. for every ^(str): , there will only be 3 decimal or non decimal numbers in the format of 1%2%3.
Any help would be appreciated.
Upvotes: 0
Views: 221
Reputation: 1273
Here is a solution using tidyr::extract
:
library(tidyr)
df %>%
extract(data,
into = paste(rep("data", 6),
c("apple", "orange"),
rep(c("A", "B", "C"), 2), sep = "_"),
regex = "\\^apple:([0-9])\\%([0-9])\\%([0-9])\\^orange:([0-9])\\%([0-9])\\%([0-9])")
Upvotes: 1