Reputation: 5088
Consider the following tree
:
library(data.tree)
acme <- Node$new("Acme Inc.")
accounting <- acme$AddChild("Accounting")
software <- accounting$AddChild("New Software")
standards <- accounting$AddChild("New Accounting Standards")
research <- acme$AddChild("Research")
newProductLine <- research$AddChild("New Product Line")
newLabs <- research$AddChild("New Labs")
it <- acme$AddChild("IT")
outsource <- it$AddChild("Outsource")
agile <- it$AddChild("Go agile")
goToR <- it$AddChild("Switch to R")
I then want to compute the averageBranchingFactor
:
averageBranchingFactor(acme)
This yields2.5
However, for various reasons I want to be able to get all the branching factors, not only the average branching factor. I need this to, for example, compare two file structures statistically with regards to significant differences across average branching factors.
According to the manual for data.tree
the AverageBranchingFactor()
function performs the following: "calculate the average number of branches each non-leaf has." Therefore, I first tried the following:
acme.df <- ToDataFrameTree(acme, "averageBranchingFactor")
mean(acme.df$averageBranchingFactor[acme.df$averageBranchingFactor>0])
This yields 2.375
, which then lead me to try a simpler version:
mean(acme.df$averageBranchingFactor)
This yields 0.8636364
How do I arrive at all the individual branching factors that together have a mean of 2.5
?
Ideally I would like to create a data.frame
that lists every folder, with a variable where the branching factor is listed for every folder. For example, I have this very simply folder structure:
top_level_folder
sub_folder_1
sub_folder_2
sub_folder_3
Answering the question would involve creating an output that looks like this:
Folders Subfolders (BranchingFactor)
top_level_folder 2
sub_folder_1 0
sub_folder_2 1
sub_folder_3 0
The first column can simply be generated through calling list.dirs("/Users/username/Downloads/top_level/")
, but I don't know how to generate the second column. Note that the second column is non-recursive, meaning that folders within subfolders are not counted (i.e. top_level_folder
contains only 2 subfolders, even though sub_folder_2
contains another folder, sub_folder_2
).
If you want to see whether your solution scales or not, download the Rails codebase: https://github.com/rails/rails/archive/master.zip and try it on Rails' more complex file structure.
Upvotes: 0
Views: 174
Reputation: 1244
The averageBranchingFactor excludes leaves.
Side note: you can get acme directly using data(acme)
.
library(data.tree)
data(acme)
acme$averageBranchingFactor
acme$count
print(acme, abf = "averageBranchingFactor", "count")
This will show like that:
levelName abf count
1 Acme Inc. 2.5 3
2 ¦--Accounting 2.0 2
3 ¦ ¦--New Software 0.0 0
4 ¦ °--New Accounting Standards 0.0 0
5 ¦--Research 2.0 2
6 ¦ ¦--New Product Line 0.0 0
7 ¦ °--New Labs 0.0 0
8 °--IT 3.0 3
9 ¦--Outsource 0.0 0
10 ¦--Go agile 0.0 0
11 °--Switch to R 0.0 0
The implementation of ?averageBranchingFactor
does not bear any secrets, so you can tweak it to your needs. Simply type averageBranchingFactor
into your console (without parenthesis):
function (node)
{
t <- Traverse(node, filterFun = isNotLeaf)
if (length(t) == 0)
return(0)
cnt <- Get(t, "count")
if (!is.numeric(cnt))
browser()
return(mean(cnt))
}
In short, we traverse the tree (except leaves), and get the count
value for each node. Finally, we calculate the mean.
Hope that helps.
Upvotes: 0
Reputation: 43354
You can adapt my answer on your other question, substituting list.dirs
with recursive = FALSE
for list.files
:
library(purrr)
files <- .libPaths()[1] %>% # omit for current directory or supply alternate path
list.dirs() %>%
map_df(~list(path = .x,
dirs = length(list.dirs(.x, recursive = FALSE))))
files
#> # A tibble: 4,457 x 2
#> path dirs
#> <chr> <int>
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
#> 7 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack 5
#> 8 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/help 0
#> 9 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/html 0
#> 10 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/acepack/libs 1
#> # ... with 4,447 more rows
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
or in base R,
files <- do.call(rbind, lapply(list.dirs(.libPaths()[1]), function(path){
data.frame(path = path,
dirs = length(list.dirs(path, recursive = FALSE)),
stringsAsFactors = FALSE)
}))
head(files)
#> path dirs
#> 1 /Library/Frameworks/R.framework/Versions/3.4/Resources/library 314
#> 2 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind 4
#> 3 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/help 0
#> 4 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/html 0
#> 5 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/Meta 0
#> 6 /Library/Frameworks/R.framework/Versions/3.4/Resources/library/abind/R 0
mean(files$dirs[files$dirs != 0])
#> [1] 2.952949
Upvotes: 0
Reputation: 4370
You can simply loop along the folder structure and count the nunber of folders (without recursivity) at each level :
dir.create("top_level_folder/sub_folder_2/sub_folder_3", recursive = TRUE)
dir.create("top_level_folder/sub_folder_1")
dirs <- list.dirs()
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i],
full.names = FALSE, recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
result[-1,]
You could also use a shorter, more idomatic and vectorised version of this code :
dirs <- list.dirs()
branching_factor <- sapply(dirs, function(x) length(list.dirs(x, FALSE, FALSE)))
result2 <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor,
row.names = NULL)[-1,]
The results looks like that :
> head(result2[rev(order(result2[,2])),])
Folders BranchingFactor
208 fixtures 24
122 fixtures 23
42 fixtures 18
440 core_ext 17
340 active_record 17
562 rails 16
Upvotes: 2
Reputation: 1631
Just correcting @Gilles solution,
path <- "SO/rails-master/"
dirs <- list.dirs(path)
branching_factor <- vector(length = length(dirs))
for (i in 1:length(dirs)) {
branching_factor[i] <- length(list.dirs(path = dirs[i], recursive = FALSE))
}
result <- data.frame(Folders = basename(dirs), BranchingFactor = branching_factor)
> head(result)
Folders BranchingFactor
1 rails-master 14
2 .github 0
3 actioncable 4
4 app 1
5 assets 1
6 javascripts 1
Hope this helps.
Upvotes: 1
Reputation: 47330
I'm taking a list of all folders recursively, then making a table of folder subfolder pairs, from these I can count the number of subfolder by folder.
I miss empty folders though, so I remerge this with the initial folders with a left join, and I fill in the NAs with zeroes.
path <- getwd()
all_folders <- path %>% list.dirs(full.names=TRUE,recursive=TRUE) %>%
data.frame(stringsAsFactors=FALSE) %>% setNames("Folders")
all_sub_folders <- all_folders$Folders %>%
strsplit("/") %>%
lapply(function(x){c(x[length(x)-1],x[length(x)])}) %>%
do.call(rbind,.) %>%
as.data.frame(stringsAsFactors=FALSE) %>%
setNames(c("ParentFolders","Folders"))
output <- all_sub_folders$ParentFolders %>% table %>% as.data.frame(stringsAsFactors=FALSE) %>% setNames(c("Folders","SubFolders")))
output <- merge(all_sub_folders,output,all.x = TRUE)[,c("Folders","SubFolders")]
output$SubFolders[is.na(output$SubFolders)] <- 0
output <- output[match(all_sub_folders$Folders,output$Folders),]
head(output)
# Folders SubFolders
# 2160 Rhome 126
# 17 acepack 5
# 856 help 1
# 992 html 9
# 1486 libs 124
# 1130 i386 0
Upvotes: 0