Reputation: 123
Through various Coursera projects, I have seen that you can incorrectly order a dataframe if you don't verify that the column that you are ordering by is converted to numeric form. For example, when I ordered a column of numbers (classified as a character vector), R ordered in ascending order: 18.9, 19.1, 9.8, 9.9.
I wonder if there is best practice for ordering? If I was not doing this on a multiple choice test, I may have never noticed the wrong order. Would professionals always ensure that a column was numeric when ordering?
Upvotes: 0
Views: 50
Reputation: 145775
Best practice is to use correct data types - a column of numbers should be class numeric
not class character
. You should check your data types when you read data in to ensure this. This is not only because of problems when ordering data, but more importantly because of bugs and errors in calculations.
As for ordering, it is usually only necessary for displaying data in a table. Another best practice would be to not re-order data unnecessarily, mostly because with large data sorting can be expensive.
Upvotes: 2