Reputation: 5784
I tried to order csv file but the rank() function acting weird on number with -E notation.
> comparison = read.csv("e:/thesis/comparison/output.csv", header=TRUE)
> comparison$proxygeneld_full.txt[0:20]
[1] 9.34E-07 4.04E-06 4.16E-06 7.17E-06 2.08E-05 3.00E-05
[7] 3.59E-05 4.16E-05 7.75E-05 9.50E-05 0.0001116 0.00012452
[13] 0.00015494 0.00017892 0.00017892 0.00018345 0.0002232 0.000231775
[19] 0.00023241 0.0002666
13329 Levels: 0.0001116 0.00012452 0.00015494 0.00017892 0.00018345 ... adjP
> rank(comparison$proxygeneld_full.txt[0:20])
[1] 19.0 14.0 16.0 17.0 11.0 12.0 13.0 15.0 18.0 20.0 1.0 2.0 3.0 4.5 4.5
[16] 6.0 7.0 8.0 9.0 10.0
#It should be 1-20 in order ....
It seems just ignore -E notation right there. It turn out to be fine if I'm not using data from file
> rank(c(9.34E-07, 4.04E-06, 7.17E-06))
[1] 1 2 3
Am I missing something ? Thanks.
Upvotes: 0
Views: 300
Reputation: 3704
I guess you have some non-numeric data in your csv file. What happens if you do?
as.numeric(comparison$proxygeneld_full.txt)
If this produces different numbers than you expected, you certainly have some text in this column.
Upvotes: 1
Reputation: 174813
Yep - $proxygeneld_full.txt[0:20]
isn't even numeric. It is a factor:
13329 Levels: 0.0001116 0.00012452 0.00015494 0.00017892 0.00018345 ... adjP
So rank()
is ranking the numeric codes that lay behind the factor representation, and the E-0X "numbers" sort after the non-E numbers in the levels.
Look at str(comparison)
and you'll see that proxygeneld_full.txt
is a factor.
I'm struggling to replicate the behaviour you are seeing with E numbers in a csv file. R reads them properly as numeric. Check your CSV to make sure you don't have some none numeric values in that column, or that the E numbers are not quoted.
Ahh! looking again at the levels you quote: there is an adjP
lurking at the end of the code you show. Check your data again as this adjP
is in there someone where and that is forcing R to code that variable as a factor hence the behaviour you see with ranking as I described above.
Upvotes: 1