How to use tapply in a special way in R

Question

There is a Dataframe like this

     developerf                      filef          
1      egamma   /cvsroot/junit/junit/README.html 
2      egamma   /cvsroot/junit/junit/README.html 
3      egamma   /cvsroot/junit/junit/README.html 
4      egamma   /cvsroot/junit/junit/README.html 
5      egamma   /cvsroot/junit/junit/README.html 
6      egamma   /cvsroot/junit/junit/README.html
7      egamma   /cvsroot/junit/junit/README.html 
8      egamma   /cvsroot/junit/junit/README.html 
9      egamma   /cvsroot/junit/junit/README.html 
10     egamma   /cvsroot/junit/junit/README.html 
11     egamma   /cvsroot/junit/junit/README.html
12     egamma   /cvsroot/junit/junit/build.xml 
13     egamma   /cvsroot/junit/junit/build.xml 
14     egamma   /cvsroot/junit/junit/build.xml 
15     egamma   /cvsroot/junit/junit/build.xml 
16     emeade   /cvsroot/junit/junit/build.xml 
17     emeade   /cvsroot/junit/junit/build.xml 
18     emeade   /cvsroot/junit/junit/build.xml 
19     emeade   /cvsroot/junit/junit/build.xml 
20     egamma   /cvsroot/junit/junit/build.xml
>

I alreade made a functionwhich gives me out which file was changed exactly n times.

 before<- sort(table(jupit$filef), decreasing = TRUE)
  t<- table(factor(before,levels = c(1,2,3,5,9,10,11)))

1  2  3  5  9 10 11 
0  0  0  0  1  0  1

Now we want to use tapply to get out how many files have been touched by exact 1...10 developers. I know that I need to use length and factor function of R. But in what way?

Outcome should look like this:

1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 0 0

Ronak Shah · Accepted Answer

I think what you are looking for is :

table(factor(tapply(df$developerf, df$filef, function(x) 
                    length(unique(x))), levels = 1:10))

# 1  2  3  4  5  6  7  8  9 10 
# 1  1  0  0  0  0  0  0  0  0

We can break it down to understand what happens at each step.

tapply(df$developerf, df$filef, function(x) length(unique(x)))

gives number of unique developers who have touched each file.

We convert the count to factor setting the levels from 1 to 10.

factor(tapply(df$developerf, df$filef, function(x) 
              length(unique(x))), levels = 1:10)

Finally we count how many times a file has been touched by 1, 2...10 developers using table.

table(factor(tapply(df$developerf, df$filef, function(x) 
                   length(unique(x))), levels = 1:10))

How to use tapply in a special way in R

Answers (2)

Related Questions