Reputation: 33
Help with R: I need to group by a column and count occurrences of values in a set of columns.
Here is my data frame
ID Ob1 Ob2 Ob3 Ob4
3792 0 0 0 1
3792 0 0 -1 0
3792 1 -2 -1 0
3792 2 -2 -1 0
8060 -1 0 -2 2
8060 -1 0 -3 0
8060 0 0 0 0
13098 0 0 0 0
13098 -1 0 -1 0
13098 0 0 0 0
I want to groupby ID and count the values of numbers in Ob1, Ob2, Ob3, Ob4. so my result should look like below;
Group -3 -2 -1 0 1 2
3792 2 3 8 2 1
8060 1 1 2 7 1
13098 2 10
I tried
table(unlist(df))
but I loose the groupby info. I get below
-3 -2 -1 0 1 2 46 3792 8060 13098
1 3 7 25 2 2 10 4 3 3
Upvotes: 2
Views: 1764
Reputation: 32538
with(reshape2::melt(df1, id.vars = "ID"), table(ID, value))
# value
#ID -3 -2 -1 0 1 2
# 3792 0 2 3 8 2 1
# 8060 1 1 2 7 0 1
# 13098 0 0 2 10 0 0
Upvotes: 1
Reputation: 886938
Using table
from base R
table(data.frame(col1 = df1$ID, col2 = unlist(df1[-1])))
# col2
#col1 -3 -2 -1 0 1 2
# 3792 0 2 3 8 2 1
# 8060 1 1 2 7 0 1
# 13098 0 0 2 10 0 0
There is also no need to create a data.frame
. The above is done so that the lengths are same (as there is recycling in data.frame
) and table
needs arguments to be of same length. So, a variant of the above should also work well
table(rep(df1$ID, ncol(df1)-1), unlist(df1[-1]))
If we are unlist
ing the whole dataset, it is just a single vector
, instead we need two vectors, one for the first column 'ID' and the second for all other columns except the 'ID'
df1 <- structure(list(ID = c(3792L, 3792L, 3792L, 3792L, 8060L, 8060L,
8060L, 13098L, 13098L, 13098L), Ob1 = c(0L, 0L, 1L, 2L, -1L,
-1L, 0L, 0L, -1L, 0L), Ob2 = c(0L, 0L, -2L, -2L, 0L, 0L, 0L,
0L, 0L, 0L), Ob3 = c(0L, -1L, -1L, -1L, -2L, -3L, 0L, 0L, -1L,
0L), Ob4 = c(1L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 0L)),
class = "data.frame", row.names = c(NA,
-10L))
Upvotes: 1
Reputation: 814
This just requires dplyr
df<-read.table(text="
ID Ob1 Ob2 Ob3 Ob4
3792 0 0 0 1
3792 0 0 -1 0
3792 1 -2 -1 0
3792 2 -2 -1 0
8060 -1 0 -2 2
8060 -1 0 -3 0
8060 0 0 0 0
13098 0 0 0 0
13098 -1 0 -1 0
13098 0 0 0 0
",header=T)
df
library(dplyr)
melt(df,id.vars = "ID")%>%
cast(.,ID~value)
Result:
Aggregation requires fun.aggregate: length used as default
ID -3 -2 -1 0 1 2
1 3792 0 2 3 8 2 1
2 8060 1 1 2 7 0 1
3 13098 0 0 2 10 0 0
Upvotes: 0
Reputation: 52208
Try using dplyr
. Assume the datafame is called df
library(dplyr)
df %>%
group_by(ID) %>%
summarise(Obs1 = sum(Obs1), Obs2 = sum(Obs2), Obs3 = sum(Obs3), Obs4 = sum(Obs4))
Upvotes: 0
Reputation: 39858
One dplyr
and tidyr
possibility could be:
df %>%
gather(var, val, -ID) %>%
select(-var) %>%
count(ID, val) %>%
spread(val, n, fill = 0)
ID `-3` `-2` `-1` `0` `1` `2`
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 3792 0 2 3 8 2 1
2 8060 1 1 2 7 0 1
3 13098 0 0 2 10 0 0
Upvotes: 1