Reputation: 59
There is a Dataframe like this
developerf filef
1 egamma /cvsroot/junit/junit/README.html
2 egamma /cvsroot/junit/junit/README.html
3 egamma /cvsroot/junit/junit/README.html
4 egamma /cvsroot/junit/junit/README.html
5 egamma /cvsroot/junit/junit/README.html
6 egamma /cvsroot/junit/junit/README.html
7 egamma /cvsroot/junit/junit/README.html
8 egamma /cvsroot/junit/junit/README.html
9 egamma /cvsroot/junit/junit/README.html
10 egamma /cvsroot/junit/junit/README.html
11 egamma /cvsroot/junit/junit/README.html
12 egamma /cvsroot/junit/junit/build.xml
13 egamma /cvsroot/junit/junit/build.xml
14 egamma /cvsroot/junit/junit/build.xml
15 egamma /cvsroot/junit/junit/build.xml
16 emeade /cvsroot/junit/junit/build.xml
17 emeade /cvsroot/junit/junit/build.xml
18 emeade /cvsroot/junit/junit/build.xml
19 emeade /cvsroot/junit/junit/build.xml
20 egamma /cvsroot/junit/junit/build.xml
>
I alreade made a functionwhich gives me out which file was changed exactly n times.
before<- sort(table(jupit$filef), decreasing = TRUE)
t<- table(factor(before,levels = c(1,2,3,5,9,10,11)))
1 2 3 5 9 10 11
0 0 0 0 1 0 1
Now we want to use tapply to get out how many files have been touched by exact 1...10 developers. I know that I need to use length and factor function of R. But in what way?
Outcome should look like this:
1 2 3 4 5 6 7 8 9 10
1 1 0 0 0 0 0 0 0 0
Upvotes: 1
Views: 56
Reputation: 72633
You could factorize your file column beforehand. The levels=
(here letters
) should be the unique files, i.e. unique(jupit$filef)
, of your data.
dat$file <- factor(dat$file, levels=letters, labels=seq(letters))
Then, solution using tapply
,
res1 <- with(dat, tapply(dev, file, length)) >= 1
res1[is.na(res1)] <- 0
res1
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
or solution using table
.
+(colSums(with(dat, table(dev, file))) >= 1)
# 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
Toy data:
dat <- structure(list(dev = c("A", "A", "B", "A", "B", "D", "D", "C",
"C", "A", "B", "C", "D", "A", "B", "B", "A", "A", "D", "C"),
file = structure(c(5L, 17L, 5L, 5L, 5L, 1L, 17L, 17L, 10L,
17L, 5L, 1L, 5L, 10L, 1L, 17L, 5L, 1L, 17L, 17L), .Label = c("a",
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
"n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y",
"z"), class = "factor")), out.attrs = list(dim = structure(4:5, .Names = c("dev",
"file")), dimnames = list(dev = c("dev=A", "dev=B", "dev=C",
"dev=D"), file = c("file=q", "file=e", "file=a", "file=j", "file=d"
))), row.names = c(NA, -20L), class = "data.frame")
dev
# dev file
# 1 A e
# 2 A q
# 3 B e
# 4 A e
# 5 B e
# 6 D a
# 7 D q
# 8 C q
# 9 C j
# 10 A q
# 11 B e
# 12 C a
# 13 D e
# 14 A j
# 15 B a
# 16 B q
# 17 A e
# 18 A a
# 19 D q
# 20 C q
Upvotes: 1
Reputation: 388817
I think what you are looking for is :
table(factor(tapply(df$developerf, df$filef, function(x)
length(unique(x))), levels = 1:10))
# 1 2 3 4 5 6 7 8 9 10
# 1 1 0 0 0 0 0 0 0 0
We can break it down to understand what happens at each step.
tapply(df$developerf, df$filef, function(x) length(unique(x)))
gives number of unique developers who have touched each file.
We convert the count to factor
setting the levels from 1 to 10.
factor(tapply(df$developerf, df$filef, function(x)
length(unique(x))), levels = 1:10)
Finally we count how many times a file has been touched by 1, 2...10 developers using table
.
table(factor(tapply(df$developerf, df$filef, function(x)
length(unique(x))), levels = 1:10))
Upvotes: 1