William Oliver
William Oliver

Reputation: 371

Why does R order function not work on this dataframe -- returns only first (unsorted) value

I have a dataframe holding survey results with the following columns:

 1) number of unanswered questions, 
 2) number of times the respondent answered with the most common (consensus) response, 
 3) number of questions answered, and 
 4) the percentage of questions in which the respondent answered with the consensus response.  

I want to sort this by the last column (percent consensus answers) and select the highest quintile. I can't seem to sort it, though.

Here's the str():

> str(consensus_participant_totals)
'data.frame':   210 obs. of  5 variables:
 $ V1           : Factor w/ 210 levels "R_06xJVSOuTuhYLOt",..: 1 2 3 4 5 6 7 8 9 10 ...
 $ num_nas      : num  0 0 0 0 0 0 0 0 0 0 ...
 $ num_cons     : num  61 61 54 54 52 55 57 52 41 60 ...
 $ num_answered : num  68 68 68 68 68 68 68 68 68 68 ...
 $ pct_consensus: num  0.868 0.794 0.735 0.809 0.779 ...

Here's the first few lines:

consensus_participant_totals
                   V1 num_nas num_cons num_answered pct_consensus
1   R_06xJVSOuTuhYLOt       0       61           68     0.8676471
2   R_09aLjPFNmYMmsbX       0       61           68     0.7941176
3   R_0AphAH5kJRGOFfL       0       54           68     0.7352941
4   R_0cTBiuOmRWuFCZL       0       54           68     0.8088235
5   R_0dBEYzi8C7A65P7       0       52           68     0.7794118
6   R_0dCNkauEqyd2Y97       0       55           68     0.8529412

when I try:

consensus_participant_totals[order(pct_consensus),]

I get

Error in order(pct_consensus) : object 'pct_consensus' not found

which suggests that I have to put it in quotes (which nobody seems to do in the examples -- I don't get why)

when I try it with quotes, I just get the first row:

consensus_participant_totals[order("pct_consensus"),]             
                V1 num_nas num_cons num_answered pct_consensus
1 R_06xJVSOuTuhYLOt       0       61           68     0.8676471

What am I doing wrong? How can I sort by "pct_consensus"

Thanks for any pointers!

Upvotes: 0

Views: 2591

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 146030

Your problem is that you can't call a column without specifying the data frame. Using a built-in data example (both so anyone can run it and because it has a lot fewer characters in the name):

Everyone has the iris data set:

head(iris)

The first column is Sepal.Length, it's in the data set but not in your workspace except as part of the data set

head(iris$Sepal.Length)
head(iris[, "Sepal.Length"])
head(Sepal.Length) # error, no Sepal.Length

So, when you sort based on a column, you must tell R where the column is. There are many ways to do this:

iris[order(iris$Sepal.Length), ]
iris[order(iris[, "Sepal.Length"]), ]
iris[order(iris["Sepal.Length"]), ]
with(iris, iris[order(Sepal.Length), ])

But it can't be ignored

iris[order(Sepal.Length), ]  # errors out

In figuring out things like this, remember that you can run tiny snippets of R code. You say

> when I try it with quotes, I just get the first row:

consensus_participant_totals[order("pct_consensus"),]

This is because your ordering a character vector of length 1. If you run

order("pct_consensus")

It's equivalent to these

order("a")
order("whatever string you put here")

They return 1, because your asking "If I sort a single string alphabetically, what position should it be in?" With only one string, the answer is always 1. So that's why you get the first row.

Upvotes: 3

William Oliver
William Oliver

Reputation: 371

Well, after playing around, it worked with the following:

c2 <-consensus_participant_totals[c(order(consensus_participant_totals[,"pct_consensus"])),]

It seems I can put the concat either before or after the order function (e.g. c(order(consensus_... or order(c(consensus... in the above

This does not look a bit like most of the tutorials. I think my error was in that most of the tutorials use the "attach" function, which avoids having to put the dataframe name in the command, and I ignored that. Thus, simply doing (for dataframe x) x[order(y),] won't work because I didn't attach x.

I think....

Upvotes: 0

Related Questions