idemanalyst
idemanalyst

Reputation: 147

Filling NAs with multiple values in R

I'm working with a dataset in R that has missing observations in my vectorFirstOfHCPCS.Code. I want to code those NAs/HCPC codes based on the value in another vector, FirstOfService.Description. Not every NA will be filled with the same value, but rather there are 6 possible values the NA could be coded as. I tried running a loop to fill in the NAs, but I think because I don't have EVERY FirstOfService.Description listed in the loop, R doesn't know what to do with those values. Here is my code for the loop and the resulting error (updated with canary's suggestion):

    for (i in 1:248308){
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65",
      "Local Psychiatric Hospital/IMD PT68", "Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22"))
{Master$FirstOfHCPCS.Code[i]=2}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Inpatient Hospital Ancillary Services - Room and Board",
      "Inpatient Hospital Ancillary Services - Leave of Absence",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Inpatient Hospital Ancillary Services - Medical/Surgical Supplies and Devices",
      "Inpatient Hospital Ancillary Services - Laboratory",
      "Inpatient Hospital Ancillary Services -EKG/ECG",
      "Inpatient Hospital Ancillary Services - EEG",
      "Inpatient Hospital Ancillary Services - Psychiatric/Psychological Treatments/Services",
      "Inpatient Hospital Ancillary Services - Other Diagnosis Services",
      "Inpatient Hospital Ancillary Services - Other Therapeutic Services"=="Inpatient Hospital Ancillary Services - Radiology",
      "Inpatient Hospital Ancillary Services - Respiratory Services",
      "Inpatient Hospital Ancillary Services -Physical Therapy",
      "Inpatient Hospital Ancillary Services - Occupational Therapy",
      "Inpatient Hospital Ancillary Services - Speech-Language Pathology",
      "Inpatient Hospital Ancillary Services - Emergency Room",
      "Inpatient Hospital Ancillary Services - Pulmonary Function",
      "Inpatient Hospital Ancillary Services - Audiology",
      "Inpatient Hospital Ancillary Services - Magnetic Resonance Technology (MRT)",
      "Inpatient Hospital Ancillary Services - Pharmacy",
      "Additional Codes-ECT Facility Charge")){Master$FirstOfHCPCS.Code[i]=1}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Pharmacy (Drugs and Other Biologicals)")){Master$FirstOfHCPCS.Code[i]=3}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Crisis Observation Care")){Master$FirstOfHCPCS.Code[i]=4}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Outpatient Partial Hospitalization")){Master$FirstOfHCPCS.Code[i]=5}
  if (is.na(Master$FirstOfHCPCS.Code[i])&Master$FirstOfService.Description[i]%in%c("Other")){Master$FirstOfHCPCS.Code[i]=6}}

Error in if (is.na(Master$FirstOfHCPCS.Code[i]) & Master$FirstOfService.Description[i] %in%  : 
  argument is of length zero

I also ran sum(is.na(Master$FirstOfHCPCS.Code)) to find out how many rows I have with NA and then replacing the 248308 in the loop code with that number (27186) but I still get the same error as above. How do I fill the NAs with multiple values? Thanks for your help!

Per Request, sample code and desired output (Desired_FirstOfHCPCS.Code)

   ##Sample Code##

FirstOfService.Description<-c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65","Wraparound", "Inpatient Hospital Ancillary Services - Room and Board",
                              "Pharmacy (Drugs and Other Biologicals)","Local Psychiatric Hospital - Acute Community PT73","State Psychiatric Hospital - Inpatient PT22","Case Management","Crisis Observation Care","Outpatient Partial Hospitalization",
                              "Other")
Desired_FirstOfHCPCS.Code<-c(2, 85, 1, 3, 2, 2, 11, 4, 5, 6)

FirstOfHCPCS.Code<-c(NA, 85, NA, NA, NA, NA, 11, NA, NA, NA)

df<-data.frame(FirstOfService.Description, FirstOfHCPCS.Code)

df

Output:

                                    FirstOfService.Description FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                NA
2                                                   Wraparound                85
3       Inpatient Hospital Ancillary Services - Room and Board                NA
4                       Pharmacy (Drugs and Other Biologicals)                NA
5            Local Psychiatric Hospital - Acute Community PT73                NA
6                  State Psychiatric Hospital - Inpatient PT22                NA
7                                              Case Management                11
8                                      Crisis Observation Care                NA
9                           Outpatient Partial Hospitalization                NA
10                                                       Other                NA

What I want it to look like:

#Desired Output
df2<-data.frame(FirstOfService.Description, Desired_FirstOfHCPCS.Code)
df2

                                    FirstOfService.Description Desired_FirstOfHCPCS.Code
1  State Mental Retardation Facility - Inpatient (ICF/MR) PT65                         2
2                                                   Wraparound                        85
3       Inpatient Hospital Ancillary Services - Room and Board                         1
4                       Pharmacy (Drugs and Other Biologicals)                         3
5            Local Psychiatric Hospital - Acute Community PT73                         2
6                  State Psychiatric Hospital - Inpatient PT22                         2
7                                              Case Management                        11
8                                      Crisis Observation Care                         4
9                           Outpatient Partial Hospitalization                         5
10                                                       Other                         6

Upvotes: 0

Views: 216

Answers (1)

canary_in_the_data_mine
canary_in_the_data_mine

Reputation: 2393

First off, it'd be useful to have some reproducible code so we know what you're working with (we don't know what your dataframe consists of).

Otherwise, it looks like there are two problems.

1) You can't use == NA; instead, use is.na().

NA == NA
[1] NA
is.na(NA)
[1] TRUE

2) Another problem is that you're using ANDs rather than ORs. In the first example, your description can't be "State mental retardation facility..." AND "Local psychiatric hospital...".

Instead, try using %in% E.g.,

is.na(Master$FirstOfHCPCS.Code[i]) & 
Master$FirstOfService.Description[i] %in% c("State Mental Retardation Facility - Inpatient (ICF/MR) PT65", "Local Psychiatric Hospital/IMD PT68")

There are quite a few other ways this code could be cleaned up (the for loops and manual assignments are pretty time consuming and error prone here), but there's a start.

Upvotes: 2

Related Questions