Reputation: 885
I am working on developing a statistical program using R, this program accepts two dataFrames. The first dataFrame carries demographics information of patients and the second carries their clinical information. The key column in the demographics dataFrame is the patientID column. While in the clinical dataFrame each patientID is a column. I wish to arrange/sort my demographics dataFrame by patientID, based upon the order of patientID's(ind columns) in the clinical dataFrame. Also the ID's could numeric or alphanumeric or could just be some-alphabet sequence. I was able to write some code, but would need help/guidance to come up with a better way to sort columns irrespective of their datatype(character, factor, numeric etc).
demogr = read.csv(mydemoFile, header = T, stringsAsFactors
=TRUE,colClasses=c('factor','factor','factor','factor','factor'))
demogr=demogr[order(as.numeric(demogr$Patient_ID)),]
myClinicalFrame=fread(myInputFile,header=T,data.table=FALSE,sep=",")
rowNames=myClinicalFrame[,1]
myClinicalFrame[,1]<-NULL
rownames(myClinicalFrame)=rowNames
names(myClinicalFrame)=sort((names(myClinicalFrame)))
The above works for certain types but fails for others. eg: Patient_ID in
demoFrame is numerically sorted above, in some situations R changes patient_ID like
109999345554545465 to 1.09e+18, which doesn't match with the second dataFrame.
Thanks
Upvotes: 0
Views: 84
Reputation: 9313
Let's start by creating two example data frames:
patientID = c(123456789012345,1234,1234567890,123)
state = c("FL","NJ","CA","TX")
demog = data.frame(ID = patientID,state = state)
clinical = data.frame(col1 = c(1,2,3),
col2 = c(3,4,5),
col2 = c(1,7,9),
col2 = c(6,4,2))
colnames(clinical) = c("1234567890","123","123456789012345","1234")
This gives us:
> demog
ID state
1 1.234568e+14 FL
2 1.234000e+03 NJ
3 1.234568e+09 CA
4 1.230000e+02 TX
and
> clinical
1234567890 123 123456789012345 1234
1 1 3 1 6
2 2 4 7 4
3 3 5 9 2
As you can see the rows in demog
are in a different order than the columns in clinical
.
To sort the rows in demog
do:
rownames(demog) = demog$ID
demog = demog[colnames(clinical),]
This works even for IDs that are factors or characters, because rownames()
will convert them to character.
Result:
> demog
ID state
1234567890 1.234568e+09 CA
123 1.230000e+02 TX
123456789012345 1.234568e+14 FL
1234 1.234000e+03 NJ
Upvotes: 1