Reputation: 1161
I have a string that is a composite of n substrings. It could look like this:
string <- c("A_AA", "A_BB", "A_BB_AAA", "B_AA", "B_BB", "B_CC")
Every subcomponent in this string is separated from any other by "_". Here, the first level consists of the values "A" and "B", the second level of "AA", "BB" and "CC", the third level of "AAA". Deeper nestings are possible and the solution should extend to those cases. The nestings are not necessarily balanced, e.g. "A" only has two children, while "B" has three, but it also has a grandchild which "B" has not.
Essentially, I want to recreate the nested structure in this string in some kind of R object, preferably a list. Thus, the nested list structure that would look like this:
list("A" = list("AA", "BB" = list("AAA")),
"B" = list("AA", "BB", "CC"))
> $A
$A[[1]]
[1] "AA"
$A$BB
$A$BB[[1]]
[1] "CCC"
$B
$B[[1]]
[1] "AA"
$B[[2]]
[1] "BB"
$B[[3]]
[1] "CC"
Any help on this is appreciated
Upvotes: 5
Views: 1121
Reputation: 145
I found this way to do it. It's weird, but seems to work
my_relist <- function(x){
y=list()
#This first loop creates the skeleton of the list
for (name in x){
split=strsplit(name,'_',fixed=TRUE)[[1]]
char='y'
l=length(split)
for (i in 1:(l-1)){
char=paste(char,'$',split[i],sep="")
}
char2=paste(char,'= list()',sep="")
#Example of char2: "y$A$BB=list()"
eval(parse(text=char2))
#Evaluates the expression inside char2
}
#The second loop fills the list with the last element
for (name in x){
split=strsplit(name,'_',fixed=TRUE)[[1]]
char='y'
l=length(split)
for (i in 1:(l-1)){
char=paste(char,'$',split[i],sep="")
}
char3=paste(char,'=c(',char,',split[l])')
#Example of char3: "y$A = c(y$A,"BB")"
eval(parse(text=char3))
}
return(y)
}
And this is the result:
example <- c("A_AA_AAA", "A_BB", "A_BB_AAA", "B_AA", "B_BB", "B_CC")
my_relist(example)
#$A
#$BB
#1.'AAA'
#[[2]]
#'AA'
#[[3]]
#'BB'
#$B
#1.'AA'
#2.'BB'
#3.'CC'
Upvotes: 0
Reputation: 18331
You can make it into a matrix without too much fuss...
string <- c("A_AA", "A_BB", "A_BB_AAA", "B_AA", "B_BB", "B_CC")
splitted<-strsplit(string,"_")
cols<-max(lengths(splitted))
mat<-do.call(rbind,lapply(splitted, "length<-", cols))
Upvotes: 3
Reputation: 10152
Not so straight forward, also not the most beautiful code, but it should do its job and return a list:
string <- c("A_AA", "A_BB", "A_BB_AAA", "B_AA", "B_BB", "B_CC")
# loop through each element of the string "str_el"
list_els <- lapply(string, function(str_el) {
# split the string into parts
els <- strsplit(str_el, "_")[[1]]
# loop backwards through the elements
for (i in length(els):1){
# the last element gives the value
if (i == length(els)){
# assign the value to a list and rename the list
res <- list(els[[i]])
names(res) <- els[[i - 1]]
} else {
# if its not the last element (value) assign the list res to another list
# with the name of that element
if (i != 1) {
res <- list(res)
names(res) <- els[[i - 1]]
}
}
}
return(res)
})
# combine the lists into one list
res_list <- mapply(c, list_els, SIMPLIFY = F)
res_list
# [[1]]
# [[1]]$A
# [1] "AA"
#
#
# [[2]]
# [[2]]$A
# [1] "BB"
#
#
# [[3]]
# [[3]]$A
# [[3]]$A$BB
# [1] "AAA"
#
#
#
# [[4]]
# [[4]]$B
# [1] "AA"
#
#
# [[5]]
# [[5]]$B
# [1] "BB"
#
#
# [[6]]
# [[6]]$B
# [1] "CC"
Does that give you what you want?
Upvotes: 1