Reputation: 53
In a dataset (N=6000) I would like to analyse how often combinations of (15 dummy)variables occur.
ID Var1 Var2 Var3 Var15
1 1 0 0 1
2 0 1 1 1
3 1 0 0 0
6000 1 0 0 0
For this example what I would like to see is that the combination 1000 occurs twice, 1001 occurs once, and 0111 occurs also once.
The only way I can think up is compute a variable for each possible combination...
Is there an elegant and efficient way to do this?
I have read through How to summarize all possible combinations of variables? But that is a slightly different question and Aggregating Tally counters transcends my knowledge (but if that is the answer to my question, I will go through it).
Upvotes: 4
Views: 5040
Reputation: 12937
A plyr
option:
plyr::count(df[-1])
# Var1 Var2 Var3 Var15 freq
#1 0 1 1 1 1
#2 1 0 0 0 2
#3 1 0 0 1 1
Upvotes: 1
Reputation: 5089
A base R solution is to use table
. Using AntoniosK's example df:
df = read.table(text = "
ID Var1 Var2 Var3 Var15
1 1 0 0 1
2 0 1 1 1
3 1 0 0 0
6000 1 0 0 0
", header=T)
res <- as.data.frame(table(df[,2:5]))
#> res
# Var1 Var2 Var3 Var15 Freq
#1 0 0 0 0 0
#2 1 0 0 0 2
#3 0 1 0 0 0
#4 1 1 0 0 0
#5 0 0 1 0 0
#6 1 0 1 0 0
#7 0 1 1 0 0
#8 1 1 1 0 0
#9 0 0 0 1 0
#10 1 0 0 1 1
#11 0 1 0 1 0
#12 1 1 0 1 0
#13 0 0 1 1 0
#14 1 0 1 1 0
#15 0 1 1 1 1
#16 1 1 1 1 0
Unlike the other examples, this gives you zero frequencies. If you do not want those zero frequency examples, then you can simply do:
res[res$Freq>0,]
Upvotes: 2
Reputation: 11310
The SPSS solution is also to simply aggregate. The following syntax will put the count in a new dataset:
DATASET DECLARE comb.
AGGREGATE /OUTFILE='comb' /BREAK=var1 to var15 /Noccurences=N.
You can also add the count to the present dataset like this:
AGGREGATE /OUTFILE=* MODE=ADDVARIABLES /BREAK=var1 var2 var3 /Noccurences=N.
Upvotes: 1
Reputation: 16121
You can just use count
like this:
df = read.table(text = "
ID Var1 Var2 Var3 Var15
1 1 0 0 1
2 0 1 1 1
3 1 0 0 0
6000 1 0 0 0
", header=T)
library(dplyr)
df %>% count(Var1, Var2, Var3, Var15)
# # A tibble: 3 x 5
# Var1 Var2 Var3 Var15 n
# <int> <int> <int> <int> <int>
# 1 0 1 1 1 1
# 2 1 0 0 0 2
# 3 1 0 0 1 1
Or use count_
if you don't want to type (many) column names:
input_names = names(df)[-1] # select all column names apart from 1st one
df %>% count_(input_names)
# # A tibble: 3 x 5
# Var1 Var2 Var3 Var15 n
# <int> <int> <int> <int> <int>
# 1 0 1 1 1 1
# 2 1 0 0 0 2
# 3 1 0 0 1 1
If you want to group your variables and create a single (combo) variable you can do this:
library(dplyr)
library(tidyr)
input_names = names(df)[-1]
df %>% count_(input_names) %>% unite_("ComboVar",input_names,sep="")
# # A tibble: 3 x 2
# ComboVar n
# * <chr> <int>
# 1 0111 1
# 2 1000 2
# 3 1001 1
Upvotes: 4
Reputation: 1327
Using the dplyr
package, you could have:
library(dplyr)
df %>% group_by(Var1, Var2, Var3, Var15) %>% tally
Upvotes: 3