R newbie
R newbie

Reputation: 59

Sapply for counting changes

I want to get the maximum and minimum number of changes for filetypes. DF:

filef                                     filetypedef    dev
 [1]/cvsroot/junit/junit/README.html      html           "egamma" 
 [2]/cvsroot/junit/junit/README.html      html           "egamma" 
 [3]/cvsroot/junit/junit/README.html      html           "egamma" 
 [4]/cvsroot/junit/junit/README.html      html           "egamma" 
 [5]/cvsroot/junit/junit/README.html      html           "egamma" 
 [6]/cvsroot/junit/junit/README.html      html           "egamma" 
 [7]/cvsroot/junit/junit/SUPPORT.html     html           "emeade"
 [8]/cvsroot/junit/junit/SUPPORT.html     html           "emeade"
 [9]/cvsroot/junit/junit/SUPPORT.html     html           "egamma"
[10]/cvsroot/junit/junit/SUPPORT.html     html           "egamma"
[11]/cvsroot/junit/junit/SUPPORT.html     html           "emeade"
[12]/cvsroot/junit/junit/build.xml        xml            "egamma"
[13]/cvsroot/junit/junit/build.xml        xml            "emeade"
[14]/cvsroot/junit/junit/build.xml        xml            "emeade"
[15]/cvsroot/junit/junit/build.xml        xml            "emeade"
[16]/cvsroot/junit/junit/build.xml        xml            "emeade"
[17]/cvsroot/junit/junit/build.xml        xml            "emeade"
[18]/cvsroot/junit/junit/new.xml          xml            "egamma"
[19]/cvsroot/junit/junit/new.xml          xml            "egamma"
[20]/cvsroot/junit/junit/new.xml          xml            "egamma"

Now he shows me the maximum and minimum of changes by every type but I want him to differate also betwenn the filenames. Means datatype xml was changed max 6 times and min 3 times.

How can i make this happen?

This is my function

filetype.table <- function(x){ count(filename, filetypedef)
  mean <- sort(sapply(table(x$filetypedef),mean), decreasing = TRUE)
  num <- sort(sapply(table(x$filetypedef),length), decreasing = TRUE)
  min <- sort(sapply(table(x$filetypedef),min), decreasing = TRUE)
  max <- sort(sapply(table(x$filetypedef),max), decreasing = TRUE)
rbind(mean, num, min, max)
}

num is the number of different files
min and max is the minimum and maximum  number of changes for that file
mean is is the mean number of changes of the filetype

At the Moment he only works with the filetypes but i want it to work also with the filef row. For example: He shows me the maximum and minimum of changes by every type but I want him to differate also betwenn the filenames. Means datatype xml was changed max 6 times and min 3 times.

The output should be like:

      html   xml
min   5      3
max   6      6
mean  5.5    4.5
num   2      2

Upvotes: 0

Views: 58

Answers (2)

randr
randr

Reputation: 314

The trick here is to pass multiple columns to table.

changes = table(df[, c("filef", "filetypedef")])
apply(changes, 2, range)
      filetypedef
       html xml
  [1,]    0   0
  [2,]    6   6

Here your minimum will often be zero. It looks like you are not interested in zeroes, so you could get rid of them by setting them to NA.

changes[changes==0] = NA
apply(changes, 2, range, na.rm = TRUE)

This gives the result as described in your question. It is also scaleable to any number of filetypes.

      filetypedef
       html xml
  [1,]    5   3
  [2,]    6   6

To add other metrics (as in the updated question), just rbind the results into one matrix:

rbind(
    mean = apply(changes, 2, mean, na.rm = TRUE),
    total = apply(changes, 2, sum, na.rm = TRUE),
    min = apply(changes, 2, min, na.rm = TRUE),
    max = apply(changes, 2, max, na.rm = TRUE)
)
      html xml
mean   5.5 4.5
total 11.0 9.0
min    5.0 3.0
max    6.0 6.0

Note: this code only uses base R functions (as stipulated in the revised question).

Upvotes: 1

Martin Gal
Martin Gal

Reputation: 16978

I suppose, the [1] - [20] aren't really part of your filenames, so I remove them.

df %>% 
  mutate(filename = gsub("\\[[0-9]{1,2}]", "", df$filef)) %>% 
  count(filename, filetypedef) %>% 
  group_by(filetypedef) %>% 
  summarise(min=min(n), max=max(n))

This gives

# A tibble: 2 x 3
  filetypedef   min   max
  <chr>       <int> <int>
1 html            5     6
2 xml             3     6

Upvotes: 0

Related Questions