Reputation: 105
I am looking for a loop throughout unknown hierarchy R (I only know the data when I request). For example I request the highest Hierachy and put them in a dataframe
id name
1 Books
2 DVDs
3 Computer
For the next step I want to loop into the books category so, I do a new request with the id(1) and get:
id name
11 Child books
12 Fantasy
Again now I want to look into the next parent catagory of Child books and do a new request for id(11)
id name
111 Baby
112 Education
113 History
And so on:
id name
1111 Sound
1112 Touch
On this moment I don't know how deep each hierarchy is, but I can tell it is different for each different category. On the end I would like that the data frame looks like this:
Id name Id name Id name id name id name
1 Books 11 Child books 111 Baby 1111 Sound ...
1 Books 11 Child books 111 Baby 1112 Touch ...
1 Books 11 Child books 112 Education etc.
1 Books 11 Child books 113 History etc.
1 Books 12 Fantasy etc.
.................
2 DVDs etc.
.................
3 Computer etc.
.................
So I can extract the numbers of rows of the next hierarchy and repeat the row that number of times.
df[rep(x,each=nrow(df_next)),]
But I have no idea how to loop over an unknown (and changing) i.
Upvotes: 1
Views: 192
Reputation: 7163
Here's a not so elegant solution:
(i) subFn
is a custom function that split id
based on different lengths:
subFn <- function(id){
len <- nchar(id)
tmp <- lapply(1:len, function(x)substring(id, x, x))
names(tmp) <- paste0("level_", 1:length(tmp))
return(tmp)
}
## example
subFn("1111")
$level_1
[1] "1"
$level_2
[1] "1"
$level_3
[1] "1"
$level_4
[1] "1"
(ii) create a list of data.frame, where the id is separated into different number of columns based on its length:
dat_list <- lapply(list(df1, df2, df3), function(x) do.call(data.frame, c(list(name=x[, "name"], stringsAsFactors=FALSE), subFn(x[, "id"]))))
(iii) Using dplyr
left_join to join two frames at a time:
dat_list[[1]] %>%
left_join(dat_list[[2]], by="level_1") %>%
left_join(dat_list[[3]], by=c("level_1", "level_2"))
name.x level_1 name.y level_2 name level_3
1 Books 1 Child books 1 Baby 1
2 Books 1 Child books 1 Education 2
3 Books 1 Child books 1 History 3
4 Books 1 Fantasy 2 <NA> <NA>
5 DVDs 2 <NA> <NA> <NA> <NA>
6 Computer 3 <NA> <NA> <NA> <NA>
To prevent the lengthy and convoluted steps in left_joining multiple data.frame, here's a solution inspired by How to join multiple data frames using dplyr?
func <- function(...){
df1 <- list(...)[[1]]
df2 <- list(...)[[2]]
col <- grep("level", names(df1), value=T)
left_join(..., by = col)
}
Reduce( func, dat_list)
Input data:
df1 <- data.frame(id = 1:3, name = c("Books", "DVDs", "Computer"))
df2 <- data.frame(id = 11:12, name = c("Child books", "Fantasy"))
df3 <- data.frame(id = 111:113, name=c("Baby", "Education", "History"))
Upvotes: 1