Tan
Tan

Reputation: 53

Grouping and Counting instances?

Is it possible to group and count instances of all other columns using R (dplyr)? For example, The following dataframe

x  a  b  c
1  0  0  0
1  1  0  1
1  2  2  1
2  1  2  1

Turns to this (note: y is value that is being counted)

EDIT:- explaining the transformation, x is what I'm grouping by, for each number grouped, i want to count how many times 0 and 1 and 2 was mentioned, as in the first row in the transformed dataframe, we counted how many times x = 1 was equal to 0 in the other columns (y), so 0 was in column a one time, column b two times and column c one time

x  y  a  b  c
1  0  1  2  1
1  1  1  0  2
1  2  1  1  0
2  1  1  0  1
2  2  0  1  0

Upvotes: 3

Views: 252

Answers (2)

Jaap
Jaap

Reputation: 83275

An approach with a combination of the melt and dcast functions of data.table or reshape2:

library(data.table) # v1.9.5+
dt.new <- dcast(melt(setDT(df), id.vars="x"), x + value ~ variable)

this gives:

dt.new
#    x value a b c
# 1: 1     0 1 2 1
# 2: 1     1 1 0 2
# 3: 1     2 1 1 0
# 4: 2     1 1 0 1
# 5: 2     2 0 1 0

In dcast you can specify which aggregation function to use, but this is in this case not necessary as the default aggregation function is length. Without using an aggregation function, you will get a warning about that:

Aggregation function missing: defaulting to length

Furthermore, if you do not explicitly convert the dataframe to a data table, data.table will redirect to reshape2 (see the explanation from @Arun in the comments). Consequently this method can be used with reshape2 as well:

library(reshape2)
df.new <- dcast(melt(df, id.vars="x"), x + value ~ variable)

Used data:

df <- read.table(text="x  a  b  c
1  0  0  0
1  1  0  1
1  2  2  1
2  1  2  1", header=TRUE)

Upvotes: 5

Paul Hiemstra
Paul Hiemstra

Reputation: 60984

I'd use a combination of gather and spread from the tidyr package, and count from dplyr:

library(dplyr)
library(tidyr)
df = data.frame(x = c(1,1,1,2), a = c(0,1,2,1), b = c(0,0,2,2), c = c(0,1,1,1))
res = df %>% 
    gather(variable, value, -x) %>% 
    count(x, variable, value) %>% 
    spread(variable, n, fill = 0)
# Source: local data frame [5 x 5]
#
#   x value a b c
# 1 1     0 1 2 1
# 2 1     1 1 0 2
# 3 1     2 1 1 0
# 4 2     1 1 0 1
# 5 2     2 0 1 0

Essentially, you first change the format of the dataset to:

head(df %>% 
    gather(variable, value, -x))
#  x variable value
#1 1        a     0
#2 1        a     1
#3 1        a     2
#4 2        a     1
#5 1        b     0
#6 1        b     0

Which allows you to use count to get the information regarding how often certain values occur in columns a to c. After that, you reformat the dataset to your required format using spread.

Upvotes: 4

Related Questions