alki
alki

Reputation: 3584

R: as.numeric data frame messing up order of values

I pasted the important parts of my code below. Basically I am creating a data.frame of which two of its columns contain numeric values and one column contains factors.

I am trying to convert the "Location" column into numeric values, however once I do, the Location values for some reason switch around.

f <- fread("ABC.txt",header=F,skip=1)$V1
f <- paste(f, collapse = "")

vector <- 1:stri_length(f)

fillmatrix <- c(rbind(strsplit(f, "")[[1]], vector))
A <- data.frame(1,matrix(fillmatrix, ncol=2, byrow = TRUE))
A <- A[c(1,3,2)]
colnames(A)=c("Track","Location","Base")

class(A$Track)
# [1] "factor"

A[1:15,]    # Before as.numeric
    Track Location Base
# 1     1        1    A
# 2     1        2    C
# 3     1        3    G
# 4     1        4    G
# 5     1        5    A
# 6     1        6    A
# 7     1        7    T
# 8     1        8    A
# 9     1        9    A
# 10    1       10    A
# 11    1       11    A
# 12    1       12    T
# 13    1       13    T
# 14    1       14    C
# 15    1       15    C

a <- transform(A, Location = as.numeric(Location), Track = as.numeric(Track))

a[1:15,]     # After as.numeric
#   Track Location Base
# 1     1        1    A
# 2     1      112    C
# 3     1      223    G
# 4     1      334    G
# 5     1      445    A
# 6     1      556    A
# 7     1      667    T
# 8     1      679    A
# 9     1      690    A
# 10    1        2    A
# 11    1       13    A
# 12    1       24    T
# 13    1       35    T
# 14    1       46    C
# 15    1       57    C

The A data frame is fairly long ~ 700 rows long. Is the way I'm creating the data.frame the issue? Or am I overlooking a small mistake?

Thank you for your help

Upvotes: 0

Views: 1497

Answers (1)

mathematical.coffee
mathematical.coffee

Reputation: 56935

A reproducible example would be good.

I suspect it is because class(A$Location) is a factor, not a character. In that case, you need as.numeric(as.character(Location)) to get the numbers as you wish. This is because R encodes factors just as integers 1:nlevels(your.factor) after doing a (string, not numeric - so 10 goes before 2) sort.

You might set stringsAsFactors=F in your data.frame call - in your fillmatrix <- ... line you seem to be converting everything to character by doing the strsplit on "" (why do you paste your f together only to split it back out again?)

Upvotes: 2

Related Questions