Reputation: 13
I am currently trying to run a MaxDiff analysis on R. I have found a case study on how to run such an analysis on R, and I am trying to use similar code on my own dataset (which at the moment is a limited number of trial observations).
Everything runs smoothly until I hit this point
#Computing individual-level ranks from counts
set.seed(0) # setting the random number seed to enhance comparability
indidualCountsNoTies = individualCounts + matrix(runif(n * nAlternatives)/100000, n) #adding random numbers to break ties
ranks = nAlternatives + 1 - apply(indidualCountsNoTies,1,rank) #ranks
rankProportions = t(apply(ranks,1,table) / n * 100)
Error in apply(ranks, 1, table)/n : non-numeric argument to binary operator
By running only
t(apply(ranks,1,table))
I noticed the result is effectively not a matrix, but this. I have no clue why this is the case, however.
SerCli Recensioni Spedizione SafePay EasyBuy Garanzia Prezzo
[1,] Integer,6 Integer,7 Integer,5 Integer,7 Integer,5 Integer,6 Integer,7
These are all the previous steps I'm taking, which seem to work alright
itData = read.spss("C:\\Users\\ricro\\Desktop\\test.sav", use.value.labels = FALSE, to.data.frame = TRUE)
# Selecting the variables containing the max-diff data
z = itData[,-1:-17]
# stacking the data (one set per row)
alternativeNames = c("SerCli","Recensioni","Spedizione","SafePay","EasyBuy","Garanzia","Prezzo")
nAlternatives = length(alternativeNames)
nBlocks = ncol(z) / nAlternatives
nAltsPerSet = 3
n = nrow(z)
nObservations = n * nBlocks
itMaxDiffData = matrix(as.numeric(t(z)),ncol = nAlternatives,byrow = TRUE,
dimnames = list(1:nObservations, alternativeNames))
#Computing overall counts
counts = apply(itMaxDiffData, 2, mean, na.rm = TRUE)
ranks = nAlternatives + 1 - rank(counts)
cbind(Counts = counts, Ranks = ranks)
#Computing individual-level counts
id = rep(1:n,rep(nBlocks,n))
individualCounts = aggregate(itMaxDiffData,list(id),mean, na.rm = TRUE)[,-1]
round(individualCounts[1:10,],1) #show data for first 10 respondents
Oddly enough, this error occurs only when running the code using my own .sav file, whereas using the sample data that comes with the code this does not happen, even if I manually edit some parts of it. This makes me think that the error might come from the structure of my SPSS file, but I'm really unsure about it.
Thanks in advance!
EDIT: once loaded in R and transformed into a workable matrix, this is the dataset I am working with
SerCli Recensioni Spedizione SafePay EasyBuy Garanzia Prezzo
1 1 0 -1 NA NA NA NA
2 -1 NA NA 1 0 NA NA
3 0 NA NA NA NA 1 -1
4 NA 0 NA -1 NA 1 NA
5 NA 1 NA NA -1 NA 0
6 NA NA 1 0 NA NA -1
7 -1 NA NA NA 1 0 NA
8 1 -1 0 NA NA NA NA
9 0 NA NA -1 1 NA NA
10 0 NA NA NA NA -1 1
11 NA -1 NA 0 NA 1 NA
12 NA -1 NA NA 1 NA 0
13 NA NA 1 0 NA NA -1
14 0 NA NA NA 1 -1 NA
15 0 -1 1 NA NA NA NA
16 0 NA NA -1 1 NA NA
17 0 NA NA NA NA 1 -1
18 NA 1 NA 0 NA -1 NA
19 NA 0 NA NA 1 NA -1
20 NA NA 1 0 NA NA -1
21 0 NA NA NA 1 -1 NA
22 1 -1 0 NA NA NA NA
23 1 NA NA -1 0 NA NA
24 1 NA NA NA NA 0 -1
25 NA 1 NA -1 NA 0 NA
26 NA -1 NA NA 0 NA 1
27 NA NA 1 -1 NA NA 0
28 1 NA NA NA 0 -1 NA
29 -1 1 0 NA NA NA NA
30 0 NA NA -1 1 NA NA
31 -1 NA NA NA NA 0 1
32 NA 1 NA -1 NA 0 NA
33 NA 0 NA NA -1 NA 1
34 NA NA 0 -1 NA NA 1
35 0 NA NA NA 1 -1 NA
36 1 0 -1 NA NA NA NA
37 0 NA NA 1 -1 NA NA
38 1 NA NA NA NA -1 0
39 NA 0 NA 1 NA -1 NA
40 NA 1 NA NA 0 NA -1
41 NA NA -1 1 NA NA 0
42 1 NA NA NA -1 0 NA
43 -1 1 0 NA NA NA NA
44 -1 NA NA 0 1 NA NA
45 0 NA NA NA NA 1 -1
46 NA 1 NA -1 NA 0 NA
47 NA 0 NA NA -1 NA 1
48 NA NA 1 0 NA NA -1
49 1 NA NA NA 0 -1 NA
50 1 -1 0 NA NA NA NA
51 0 NA NA 1 -1 NA NA
52 0 NA NA NA NA 1 -1
53 NA 1 NA 0 NA -1 NA
54 NA 1 NA NA -1 NA 0
55 NA NA 1 -1 NA NA 0
56 1 NA NA NA 0 -1 NA
57 0 1 -1 NA NA NA NA
58 0 NA NA 1 -1 NA NA
59 -1 NA NA NA NA 1 0
60 NA 1 NA -1 NA 0 NA
61 NA 1 NA NA 0 NA -1
62 NA NA 0 -1 NA NA 1
63 1 NA NA NA -1 0 NA
64 0 1 -1 NA NA NA NA
65 -1 NA NA 1 0 NA NA
66 0 NA NA NA NA 1 -1
67 NA -1 NA 1 NA 0 NA
68 NA -1 NA NA 0 NA 1
69 NA NA 0 -1 NA NA 1
70 1 NA NA NA 0 -1 NA
71 -1 1 0 NA NA NA NA
72 -1 NA NA 0 1 NA NA
73 1 NA NA NA NA -1 0
74 NA 0 NA 1 NA -1 NA
75 NA -1 NA NA 0 NA 1
76 NA NA 1 -1 NA NA 0
77 1 NA NA NA -1 0 NA
78 0 -1 1 NA NA NA NA
79 0 NA NA 1 -1 NA NA
80 1 NA NA NA NA 0 -1
81 NA 0 NA 1 NA -1 NA
82 NA 1 NA NA -1 NA 0
83 NA NA 1 -1 NA NA 0
84 1 NA NA NA -1 0 NA
85 0 1 -1 NA NA NA NA
86 -1 NA NA 1 0 NA NA
87 0 NA NA NA NA -1 1
88 NA 1 NA 0 NA -1 NA
89 NA 1 NA NA -1 NA 0
90 NA NA -1 1 NA NA 0
91 0 NA NA NA 1 -1 NA
92 1 -1 0 NA NA NA NA
93 0 NA NA 1 -1 NA NA
94 1 NA NA NA NA -1 0
95 NA 1 NA 0 NA -1 NA
96 NA -1 NA NA 0 NA 1
97 NA NA 1 -1 NA NA 0
98 1 NA NA NA -1 0 NA
This, instead, is the ranks
matrix I'm trying to execute apply
on
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14]
SerCli 6 3 4 1 6 2 5 1 3 5 6 2 5 1
Recensioni 2 7 3 5 2 3 1 3 1 6 5 4 1 5
Spedizione 4 2 2 2 4 7 2 2 6 7 1 1 7 2
SafePay 3 5 6 7 7 1 6 4 5 2 3 3 2 4
EasyBuy 5 1 1 4 3 5 4 7 7 4 4 7 4 7
Garanzia 1 6 5 6 5 6 3 5 2 3 7 6 6 6
Prezzo 7 4 7 3 1 4 7 6 4 1 2 5 3 3
EDIT #2
Okay, we're getting there. Just a pair of things, hoping to finally solve this. I haven't actually loaded tidyverse
and I defined n
previously as n <- nrow(z)
, i.e. the number of observations from the dataset. Anyhow, here's the output for dput(ranks)
structure(c(6, 2, 4, 3, 5, 1, 7, 3, 7, 2, 5, 1, 6, 4, 4, 3, 2,
6, 1, 5, 7, 1, 5, 2, 7, 4, 6, 3, 6, 2, 4, 7, 3, 5, 1, 2, 3, 7,
1, 5, 6, 4, 5, 1, 2, 6, 4, 3, 7, 1, 3, 2, 4, 7, 5, 6, 3, 1, 6,
5, 7, 2, 4, 5, 6, 7, 2, 4, 3, 1, 6, 5, 1, 3, 4, 7, 2, 2, 4, 1,
3, 7, 6, 5, 5, 1, 7, 2, 4, 6, 3, 1, 5, 2, 4, 7, 6, 3), .Dim = c(7L,
14L), .Dimnames = list(c("SerCli", "Recensioni", "Spedizione",
"SafePay", "EasyBuy", "Garanzia", "Prezzo"), NULL))
Upvotes: 0
Views: 123
Reputation: 263451
Let's start at the top of your question where you say:
By running only
t(apply(ranks,1,table)) I noticed the result is effectively not a matrix, but this. I have no clue why this is the case, however.
SerCli Recensioni Spedizione SafePay EasyBuy Garanzia Prezzo
[1,] Integer,6 Integer,7 Integer,5 Integer,7 Integer,5 Integer,6 Integer,7
That output indicates that instead of a numeric matrix you got a matrix of list elements. That's because apply
got a different number of items on various rows as it looped over the row sequence. The table
-function was returning vectors of varying lengths, so apply was unable to build a proper matrix.
So that's the "why". To build a proper "how", we need a dataset we can work with.
After edit:
It would have been easier if you had used dput
to display an ASCII version of the internal structure of that ranks matrix, but here is how I reconstructed it:
ranks <- matrix( scan(text="6 3 4 1 6 2 5 1 3 5 6 2 5 1
2 7 3 5 2 3 1 3 1 6 5 4 1 5
4 2 2 2 4 7 2 2 6 7 1 1 7 2
3 5 6 7 7 1 6 4 5 2 3 3 2 4
5 1 1 4 3 5 4 7 7 4 4 7 4 7
1 6 5 6 5 6 3 5 2 3 7 6 6 6
7 4 7 3 1 4 7 6 4 1 2 5 3 3"), ncol=14)
nms <- scan(text="SerCli Recensioni Spedizione SafePay EasyBuy Garanzia Prezzo ",what="")
#Read 7 items
rownames(ranks) <- nms
Then I looked at:
ranktab = apply(ranks,1,table)
ranktab
Which contrary to my earlier suggestion was an ordinary numeric matrix. The error was being thrown because you had not defined an n
and I'm guessing you had the tidyverse super-package loaded. It has an n
function which is not numeric. You probably want this although it's kind of boring:
rankProportions = t(apply(ranks,1,table) / length(ranks) * 100)
> rankProportions
1 2 3 4 5 6 7
SerCli 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
Recensioni 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
Spedizione 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
SafePay 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
EasyBuy 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
Garanzia 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
Prezzo 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816 2.040816
So now it's even more important that you post dput(ranks)
since the output you gave me was not causing the error you reported.
After the dput output provided:
So now my original hypothesis is confirmed and the question is how to get counts of vector values that span a predetermined range, in this case, the range of values being 1:7. The tabulate function is sometimes a better choice, since it lets you specify the number of bins:
t( apply(ranks2,1,tabulate, nbins=7 ) )
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
SerCli 3 2 2 1 3 3 0
Recensioni 3 2 3 1 3 1 1
Spedizione 2 6 0 2 0 1 3
SafePay 1 2 3 2 2 2 2
EasyBuy 2 0 1 5 2 0 4
Garanzia 1 1 2 0 3 6 1
Prezzo 2 1 3 3 1 1 3
.... an now your calcualtion of table proportions should "proceed apace".
Upvotes: 2