Reputation: 371
I have a dataframe holding survey results with the following columns:
1) number of unanswered questions,
2) number of times the respondent answered with the most common (consensus) response,
3) number of questions answered, and
4) the percentage of questions in which the respondent answered with the consensus response.
I want to sort this by the last column (percent consensus answers) and select the highest quintile. I can't seem to sort it, though.
Here's the str():
> str(consensus_participant_totals)
'data.frame': 210 obs. of 5 variables:
$ V1 : Factor w/ 210 levels "R_06xJVSOuTuhYLOt",..: 1 2 3 4 5 6 7 8 9 10 ...
$ num_nas : num 0 0 0 0 0 0 0 0 0 0 ...
$ num_cons : num 61 61 54 54 52 55 57 52 41 60 ...
$ num_answered : num 68 68 68 68 68 68 68 68 68 68 ...
$ pct_consensus: num 0.868 0.794 0.735 0.809 0.779 ...
Here's the first few lines:
consensus_participant_totals
V1 num_nas num_cons num_answered pct_consensus
1 R_06xJVSOuTuhYLOt 0 61 68 0.8676471
2 R_09aLjPFNmYMmsbX 0 61 68 0.7941176
3 R_0AphAH5kJRGOFfL 0 54 68 0.7352941
4 R_0cTBiuOmRWuFCZL 0 54 68 0.8088235
5 R_0dBEYzi8C7A65P7 0 52 68 0.7794118
6 R_0dCNkauEqyd2Y97 0 55 68 0.8529412
when I try:
consensus_participant_totals[order(pct_consensus),]
I get
Error in order(pct_consensus) : object 'pct_consensus' not found
which suggests that I have to put it in quotes (which nobody seems to do in the examples -- I don't get why)
when I try it with quotes, I just get the first row:
consensus_participant_totals[order("pct_consensus"),]
V1 num_nas num_cons num_answered pct_consensus
1 R_06xJVSOuTuhYLOt 0 61 68 0.8676471
What am I doing wrong? How can I sort by "pct_consensus"
Thanks for any pointers!
Upvotes: 0
Views: 2591
Reputation: 146030
Your problem is that you can't call a column without specifying the data frame. Using a built-in data example (both so anyone can run it and because it has a lot fewer characters in the name):
Everyone has the iris
data set:
head(iris)
The first column is Sepal.Length
, it's in the data set but not in your workspace except as part of the data set
head(iris$Sepal.Length)
head(iris[, "Sepal.Length"])
head(Sepal.Length) # error, no Sepal.Length
So, when you sort based on a column, you must tell R where the column is. There are many ways to do this:
iris[order(iris$Sepal.Length), ]
iris[order(iris[, "Sepal.Length"]), ]
iris[order(iris["Sepal.Length"]), ]
with(iris, iris[order(Sepal.Length), ])
But it can't be ignored
iris[order(Sepal.Length), ] # errors out
In figuring out things like this, remember that you can run tiny snippets of R code. You say
> when I try it with quotes, I just get the first row:
consensus_participant_totals[order("pct_consensus"),]
This is because your ordering a character vector of length 1. If you run
order("pct_consensus")
It's equivalent to these
order("a")
order("whatever string you put here")
They return 1, because your asking "If I sort a single string alphabetically, what position should it be in?" With only one string, the answer is always 1. So that's why you get the first row.
Upvotes: 3
Reputation: 371
Well, after playing around, it worked with the following:
c2 <-consensus_participant_totals[c(order(consensus_participant_totals[,"pct_consensus"])),]
It seems I can put the concat either before or after the order function (e.g. c(order(consensus_... or order(c(consensus... in the above
This does not look a bit like most of the tutorials. I think my error was in that most of the tutorials use the "attach" function, which avoids having to put the dataframe name in the command, and I ignored that. Thus, simply doing (for dataframe x) x[order(y),] won't work because I didn't attach x.
I think....
Upvotes: 0