Scientist
Scientist

Reputation: 1339

Dataframe numbers stuck as string during format manipulation

I have a large dataset which is composed of measurements from two species. I need the results as numbers in a matrix for some analysis, but somehow through trivial manipulation of the dataframe the numbers get converted to strings and cannot be directly converted back into a matrix. To be clear, I repeat the main steps below:

df<-data.frame(
X1=c(1,2,3),
X2=c(4,5,6),
X3=c(7,8,9)
)

med1<-c("name1", "name2", "name3")
rownames(df)<-med1

df2<-as.data.frame(cbind(t(df), species=c("inv1", "inv2", "inv3")))

The new dataset df2 cannot be reversed the same df as a matrix if the added column is removed. I cannot do operations with the restored dataframe. class(df) [1] "data.frame" class(df2) [1] "data.frame"

 as.matrix(df)+1 #this operation works fine
 as.matrix(t(df2)[1:3,])+1 #this doesn´t work as figures seem fixed as strings
 as.matrix(df)==as.matrix(t(df2)[1:3,]) #but logic operator says they´re identical?

Please, what is happening, and how can I recover df from df2 as a full numeric matrix?

Upvotes: 0

Views: 40

Answers (1)

r2evans
r2evans

Reputation: 160447

Use cbind on a frame, not on a matrix. Note that

class(df)
# [1] "data.frame"
class(t(df))
# [1] "matrix" "array" 

so the S3 method of cbind called on t(df) is using the cbind.matrix method on dispatch. This keeps it all numeric, which is fine, until you try to combine with the species vector of strings, which then up-converts all numbers to strings.

Solutions:

  1. cbind on a frame, not a matrix, such as:

    df2 <- cbind(data.frame(t(df)), species=c("inv1", "inv2", "inv3"))
    class(df2)
    # [1] "data.frame"
    str(df2)
    # 'data.frame': 3 obs. of  4 variables:
    #  $ name1  : num  1 4 7
    #  $ name2  : num  2 5 8
    #  $ name3  : num  3 6 9
    #  $ species: chr  "inv1" "inv2" "inv3"
    
  2. Avoid cbind, just transform it (if using base R):

    df2 <- transform(data.frame(t(df)), species=c("inv1", "inv2", "inv3"))
    

In which case, you can recover the original df with:

data.frame(t(df2[,1:3]))
#       X1 X2 X3
# name1  1  4  7
# name2  2  5  8
# name3  3  6  9

Upvotes: 1

Related Questions