Reputation: 59
I want to get the maximum and minimum number of changes for filetypes. DF:
filef filetypedef dev
[1]/cvsroot/junit/junit/README.html html "egamma"
[2]/cvsroot/junit/junit/README.html html "egamma"
[3]/cvsroot/junit/junit/README.html html "egamma"
[4]/cvsroot/junit/junit/README.html html "egamma"
[5]/cvsroot/junit/junit/README.html html "egamma"
[6]/cvsroot/junit/junit/README.html html "egamma"
[7]/cvsroot/junit/junit/SUPPORT.html html "emeade"
[8]/cvsroot/junit/junit/SUPPORT.html html "emeade"
[9]/cvsroot/junit/junit/SUPPORT.html html "egamma"
[10]/cvsroot/junit/junit/SUPPORT.html html "egamma"
[11]/cvsroot/junit/junit/SUPPORT.html html "emeade"
[12]/cvsroot/junit/junit/build.xml xml "egamma"
[13]/cvsroot/junit/junit/build.xml xml "emeade"
[14]/cvsroot/junit/junit/build.xml xml "emeade"
[15]/cvsroot/junit/junit/build.xml xml "emeade"
[16]/cvsroot/junit/junit/build.xml xml "emeade"
[17]/cvsroot/junit/junit/build.xml xml "emeade"
[18]/cvsroot/junit/junit/new.xml xml "egamma"
[19]/cvsroot/junit/junit/new.xml xml "egamma"
[20]/cvsroot/junit/junit/new.xml xml "egamma"
Now he shows me the maximum and minimum of changes by every type but I want him to differate also betwenn the filenames. Means datatype xml was changed max 6 times and min 3 times.
How can i make this happen?
This is my function
filetype.table <- function(x){ count(filename, filetypedef)
mean <- sort(sapply(table(x$filetypedef),mean), decreasing = TRUE)
num <- sort(sapply(table(x$filetypedef),length), decreasing = TRUE)
min <- sort(sapply(table(x$filetypedef),min), decreasing = TRUE)
max <- sort(sapply(table(x$filetypedef),max), decreasing = TRUE)
rbind(mean, num, min, max)
}
num is the number of different files
min and max is the minimum and maximum number of changes for that file
mean is is the mean number of changes of the filetype
At the Moment he only works with the filetypes but i want it to work also with the filef row. For example: He shows me the maximum and minimum of changes by every type but I want him to differate also betwenn the filenames. Means datatype xml was changed max 6 times and min 3 times.
The output should be like:
html xml
min 5 3
max 6 6
mean 5.5 4.5
num 2 2
Upvotes: 0
Views: 58
Reputation: 314
The trick here is to pass multiple columns to table
.
changes = table(df[, c("filef", "filetypedef")])
apply(changes, 2, range)
filetypedef
html xml
[1,] 0 0
[2,] 6 6
Here your minimum will often be zero. It looks like you are not interested in zeroes, so you could get rid of them by setting them to NA
.
changes[changes==0] = NA
apply(changes, 2, range, na.rm = TRUE)
This gives the result as described in your question. It is also scaleable to any number of filetypes.
filetypedef
html xml
[1,] 5 3
[2,] 6 6
To add other metrics (as in the updated question), just rbind the results into one matrix:
rbind(
mean = apply(changes, 2, mean, na.rm = TRUE),
total = apply(changes, 2, sum, na.rm = TRUE),
min = apply(changes, 2, min, na.rm = TRUE),
max = apply(changes, 2, max, na.rm = TRUE)
)
html xml
mean 5.5 4.5
total 11.0 9.0
min 5.0 3.0
max 6.0 6.0
Note: this code only uses base R functions (as stipulated in the revised question).
Upvotes: 1
Reputation: 16978
I suppose, the [1]
- [20]
aren't really part of your filenames, so I remove them.
df %>%
mutate(filename = gsub("\\[[0-9]{1,2}]", "", df$filef)) %>%
count(filename, filetypedef) %>%
group_by(filetypedef) %>%
summarise(min=min(n), max=max(n))
This gives
# A tibble: 2 x 3
filetypedef min max
<chr> <int> <int>
1 html 5 6
2 xml 3 6
Upvotes: 0