VectorTraverse
VectorTraverse

Reputation: 23

Why is sapply() taking my matrix and turning it into a list that I cannot factor?

I am using the sapply() function to create a new column of data. First, from my raw data of observations, every patient receives a number between 1-999, each number has a unique description, but they all fall into 1 out of 27 categories. My problem is that the 27 categories are not given in the raw data, so I have to look them up in a dictionary which has the categories that match the numbers 1-999.

Here is the raw data from a data set titled inova9:

ID AgeGroup  Race SexCode Org_DRGCode
9     9    75-84 White       F         435
10   10    75-84 White       F         441
11   11    45-54 White       F         301
40   40    14-17 White       F         775
70   70    75-84 White       F         853
120 120    55-64 White       M         395

Here is part of my dictionary:

MSDRG_num                                                MS.DRG_Descriptions_
1         1            Heart transplant or implant of heart assist system w MCC
2         2          Heart transplant or implant of heart assist system w/o MCC
3         3 ECMO or trach w MV 96+ hrs or PDX exc face, mouth & neck w maj O.R.
4         4       Trach w MV 96+ hrs or PDX exc face, mouth & neck w/o maj O.R.
5         5                     Liver transplant w MCC or intestinal transplant
6         6                                            Liver transplant w/o MCC
New_CI_Category
1      Organ Transplant
2      Organ Transplant
3 General/Other Surgery
4 General/Other Surgery
5      Organ Transplant
6      Organ Transplant

here are the 27 categories:

> levels(DRG$New_CI_Category)
[1] "Bariatric Surgery"                  "Behavioral"                        
[3] "Cardiovasc Medicine"                "CV Surg - Open Heart"              
[5] "General/Other Surgery"              "GYN Med/Surg"                      
[7] "Hem/Onc Medicine"                   "Interventional Cardiology - EP"    
[9] "Interventional Cardiology - PCI"    "Medicine"                          
[11] "Neonates"                           "Neurology"                         
[13] "Neurosurgery - Brain"               "Neurosurgery - Other"              
[15] "Normal Newborns"                    "OB Deliveries"                     
[17] "OB Other"                           "Organ Transplant"                  
[19] "Ortho Medicine"                     "Ortho Surg - Other"                
[21] "Ortho Surgery - Joints"             "Rehab"                             
[23] "Spine"                              "Thoracic Surgery"                  
[25] "Unspecified"                        "Urology Surgery"                   
[27] "Vascular Procedure - Surgery or IR"

So, I need to match up inova9$Org_DRGCode with MSDRG_num from my dictionary, then pull the corresponding category from DRG$New_CI_Catgory

I implemented the following:

ServiceLine1 = matrix(nrow=length(inova9$Org_DRGCode),ncol=1)
ServiceLine1 =    sapply(1:length(inova9$Org_DRGCode),function(i)as.character(DRG$New_CI_Category[DRG$MSDRG_num==inova9$Org_DRGCode[i]]))
Svc = as.factor(ServiceLine1)
inova9 = data.frame(inova9,Svc)

As, you can see, I created a column and now I can merge it with my original data, one-to-one. I have four data sets like this, but it only works for two. The other two I receive this error:

> Svc = as.factor(ServiceLine2)
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

And my data looks like this:

[[1]]
[1] "Neurology"

[[2]]
[1] "Medicine"

[[3]]
[1] "GYN Med/Surg"

[[4]]
[1] "Vascular Procedure - Surgery or IR"

[[5]]
[1] "Neurology"

[[6]]
[1] "Medicine"

How did sapply() turn my matrix into a list and how do i stop it from happening?

Upvotes: 1

Views: 131

Answers (2)

shadowtalker
shadowtalker

Reputation: 13853

This happens because sapply is a wrapper for lapply that tries to be smart about its return structure. When, for whatever reason, it can't figure it out, it will always fall back to a list because that is what lapply returns.

Now, I'm not entirely sure why that's happening here. Just reading your code, I would also expect sapply to return a vector and not a list. One possibility is that, for some value of i, the expression as.character(DRG$New_CI_Category[DRG$MSDRG_num==inova9$Org_DRGCode[i]]) has length greater than one. You can check this with any(sapply(ServiceLine1, length) > 1).

In any case, the function unlist will compress a list down to a vector, so you can do as.factor(unlist(ServiceLine1)).

Upvotes: 1

Ricardo Saporta
Ricardo Saporta

Reputation: 55350

You might save yourself a headache by converting your data.table, setting a key then simply joining.

library(data.table)
DT.DRG  <- as.data.table(DRG)
DT.dict <- as.data.table(your_dict)

## Set the key to what you want to join on 
setkey(DT.DRG,  ID)
setkey(DT.dict, MSDRG_num)

## Assign the column from DT.dict into DT.DRG, joining on the keys
DT.DRG[DT.dict, New_CI_Category := New_CI_Category]

Make sure the keys are of the same type

meaning that they are both factor or both character, etc

Upvotes: 3

Related Questions