Sem
Sem

Reputation: 53

Frequency of each unique combination in data frame

In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.

ID       Var1        Var2       Var3    Var15

1          1          0          0        1

2          0          1          1        1

3          1          0          0        0

6000       1          0          0        0

For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.

The only way I can think up is compute a variable for each possible combination...

Is there an elegant and efficient way to do this?

I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).

Upvotes: 4

Views: 5040

Answers (5)

989
989

Reputation: 12937

A plyr option:

plyr::count(df[-1])

#   Var1 Var2 Var3 Var15 freq
#1    0    1    1     1    1
#2    1    0    0     0    2
#3    1    0    0     1    1

Upvotes: 1

Andy W
Andy W

Reputation: 5089

A base R solution is to use table. Using AntoniosK's example df:

df = read.table(text = "
ID       Var1        Var2       Var3    Var15
1          1          0          0        1
2          0          1          1        1
3          1          0          0        0
6000       1          0          0        0
", header=T)

res <- as.data.frame(table(df[,2:5]))

#> res
#   Var1 Var2 Var3 Var15 Freq
#1     0    0    0     0    0
#2     1    0    0     0    2
#3     0    1    0     0    0
#4     1    1    0     0    0
#5     0    0    1     0    0
#6     1    0    1     0    0
#7     0    1    1     0    0
#8     1    1    1     0    0
#9     0    0    0     1    0
#10    1    0    0     1    1
#11    0    1    0     1    0
#12    1    1    0     1    0
#13    0    0    1     1    0
#14    1    0    1     1    0
#15    0    1    1     1    1
#16    1    1    1     1    0

Unlike the other examples, this gives you zero frequencies. If you do not want those zero frequency examples, then you can simply do:

res[res$Freq>0,]

Upvotes: 2

eli-k
eli-k

Reputation: 11310

The SPSS solution is also to simply aggregate. The following syntax will put the count in a new dataset:

DATASET DECLARE comb.
AGGREGATE  /OUTFILE='comb' /BREAK=var1 to var15 /Noccurences=N.

You can also add the count to the present dataset like this:

AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=var1 var2 var3 /Noccurences=N.

Upvotes: 1

AntoniosK
AntoniosK

Reputation: 16121

You can just use count like this:

df = read.table(text = "
ID       Var1        Var2       Var3    Var15
1          1          0          0        1
2          0          1          1        1
3          1          0          0        0
6000       1          0          0        0
", header=T)

library(dplyr)

df %>% count(Var1, Var2, Var3, Var15)

# # A tibble: 3 x 5
#     Var1  Var2  Var3 Var15     n
#    <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

Or use count_ if you don't want to type (many) column names:

input_names = names(df)[-1]  # select all column names apart from 1st one

df %>% count_(input_names)

# # A tibble: 3 x 5
#    Var1  Var2  Var3 Var15     n
#   <int> <int> <int> <int> <int>
# 1     0     1     1     1     1
# 2     1     0     0     0     2
# 3     1     0     0     1     1

If you want to group your variables and create a single (combo) variable you can do this:

library(dplyr)
library(tidyr)

input_names = names(df)[-1]

df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")

# # A tibble: 3 x 2
#   ComboVar     n
# * <chr>    <int>
# 1 0111         1
# 2 1000         2
# 3 1001         1

Upvotes: 4

Constantinos
Constantinos

Reputation: 1327

Using the dplyr package, you could have:

library(dplyr)
df %>% group_by(Var1, Var2, Var3, Var15) %>% tally

Upvotes: 3

Related Questions