R newbie
R newbie

Reputation: 59

How to use tapply in a special way in R

There is a Dataframe like this

     developerf                      filef          
1      egamma   /cvsroot/junit/junit/README.html 
2      egamma   /cvsroot/junit/junit/README.html 
3      egamma   /cvsroot/junit/junit/README.html 
4      egamma   /cvsroot/junit/junit/README.html 
5      egamma   /cvsroot/junit/junit/README.html 
6      egamma   /cvsroot/junit/junit/README.html
7      egamma   /cvsroot/junit/junit/README.html 
8      egamma   /cvsroot/junit/junit/README.html 
9      egamma   /cvsroot/junit/junit/README.html 
10     egamma   /cvsroot/junit/junit/README.html 
11     egamma   /cvsroot/junit/junit/README.html
12     egamma   /cvsroot/junit/junit/build.xml 
13     egamma   /cvsroot/junit/junit/build.xml 
14     egamma   /cvsroot/junit/junit/build.xml 
15     egamma   /cvsroot/junit/junit/build.xml 
16     emeade   /cvsroot/junit/junit/build.xml 
17     emeade   /cvsroot/junit/junit/build.xml 
18     emeade   /cvsroot/junit/junit/build.xml 
19     emeade   /cvsroot/junit/junit/build.xml 
20     egamma   /cvsroot/junit/junit/build.xml
> 

I alreade made a functionwhich gives me out which file was changed exactly n times.

 before<- sort(table(jupit$filef), decreasing = TRUE)
  t<- table(factor(before,levels = c(1,2,3,5,9,10,11)))

1  2  3  5  9 10 11 
0  0  0  0  1  0  1 

Now we want to use tapply to get out how many files have been touched by exact 1...10 developers. I know that I need to use length and factor function of R. But in what way?

Outcome should look like this:

1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 0 0

Upvotes: 1

Views: 56

Answers (2)

jay.sf
jay.sf

Reputation: 72633

You could factorize your file column beforehand. The levels= (here letters) should be the unique files, i.e. unique(jupit$filef), of your data.

dat$file <- factor(dat$file, levels=letters, labels=seq(letters))

Then, solution using tapply,

res1 <- with(dat, tapply(dev, file, length)) >= 1
res1[is.na(res1)] <- 0
res1
 # 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
 # 1  0  0  0  1  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0 

or solution using table.

+(colSums(with(dat, table(dev, file))) >= 1)
 # 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 
 # 1  0  0  0  1  0  0  0  0  1  0  0  0  0  0  0  1  0  0  0  0  0  0  0  0  0 

Toy data:

dat <- structure(list(dev = c("A", "A", "B", "A", "B", "D", "D", "C", 
"C", "A", "B", "C", "D", "A", "B", "B", "A", "A", "D", "C"), 
    file = structure(c(5L, 17L, 5L, 5L, 5L, 1L, 17L, 17L, 10L, 
    17L, 5L, 1L, 5L, 10L, 1L, 17L, 5L, 1L, 17L, 17L), .Label = c("a", 
    "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", 
    "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", 
    "z"), class = "factor")), out.attrs = list(dim = structure(4:5, .Names = c("dev", 
"file")), dimnames = list(dev = c("dev=A", "dev=B", "dev=C", 
"dev=D"), file = c("file=q", "file=e", "file=a", "file=j", "file=d"
))), row.names = c(NA, -20L), class = "data.frame")

dev
#    dev file
# 1    A    e
# 2    A    q
# 3    B    e
# 4    A    e
# 5    B    e
# 6    D    a
# 7    D    q
# 8    C    q
# 9    C    j
# 10   A    q
# 11   B    e
# 12   C    a
# 13   D    e
# 14   A    j
# 15   B    a
# 16   B    q
# 17   A    e
# 18   A    a
# 19   D    q
# 20   C    q

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

I think what you are looking for is :

table(factor(tapply(df$developerf, df$filef, function(x) 
                    length(unique(x))), levels = 1:10))

# 1  2  3  4  5  6  7  8  9 10 
# 1  1  0  0  0  0  0  0  0  0 

We can break it down to understand what happens at each step.

tapply(df$developerf, df$filef, function(x) length(unique(x)))

gives number of unique developers who have touched each file.

We convert the count to factor setting the levels from 1 to 10.

factor(tapply(df$developerf, df$filef, function(x) 
              length(unique(x))), levels = 1:10)

Finally we count how many times a file has been touched by 1, 2...10 developers using table.

table(factor(tapply(df$developerf, df$filef, function(x) 
                   length(unique(x))), levels = 1:10))

Upvotes: 1

Related Questions