Jorge A
Jorge A

Reputation: 57

Ops "-" only defined for equally - sized data frames function

I have a dataset of patients with several numerical variables including age (in decimal), height, weight, gender, BMI, and triglycerides. I want to create new variables like talla_z, peso_z, trigliceridos_z, which are the z-scores for each variable.

The age values are in decimals, so they need to be converted to match values in my z-score lookup table, e.g., 12.48 should match 12.5, not 12 in the table.

Here's the format for the lookup table for weight (Peso) with median (P50) and standard deviation (DS):

   PESO             

Edad HP50   HDS MP50    MDS
10  36.05   7.32    36.11   6.26
...
17.5    69.25   10.1    58.16   8.3

Here's the format for the patient data:

Edad Peso1  IMC1    Trig1   Talla1
11.43   84  32  22  180
...
17.3    69  25  24  210 

How can I create a function in R to automatically assign z-scores to each individual based on age and gender?

The solution I'm trying is this one.

You can create a function to calculate the z-score for each patient based on their age, gender, and the given variable. You can round the decimal age to the nearest value that exists in the z-score lookup table:

# z-score
calculate_z_score <- function(patients_data, p50_ds_table, variable_name) {
  # new column
  z_column <- numeric(nrow(patients_data))
  
  # Iteration
  for (i in seq_len(nrow(patients_data))) {
    # rounding
    rounded_age <- round(patients_data$Edad[i] * 2) / 2
    
    # match rows
    row_index <- which(p50_ds_table$Edad == rounded_age)
    
    if (patients_data$Sexo[i] == "Hombres") {
      p50 <- p50_ds_table[row_index, "HP50"]
      ds <- p50_ds_table[row_index, "HDS"]
    } else {
      p50 <- p50_ds_table[row_index, "MP50"]
      ds <- p50_ds_table[row_index, "MDS"]
    }
    pd <- patients_data[i, variable_name]
    # z score calc
    z_column[i] <- ( pd - p50) / ds - pd ]
  }
  
  # column in data.frame
  patients_data[paste(variable_name, "_z", sep = "")] <- z_column
  
  return(patients_data)
}

# example
patients_data <- calculate_z_score(patients_data, p50_ds_table, "Peso")

This function will iterate through the patient data and round the age to the nearest value in the p50_ds_table. It then calculates the z-score for the given variable based on the patient's age. But I think it cannot take into account the gender.

When I try it , it says Ops.data.frame(pd,p50) "-" only defined for equally sized dataframes.

When I extract the individual values, patients_data[1,"Peso"] - p50[1,"HP50"] for example, it works.

I've found this solution but I cannot see how to apply it in my example

How can I make this work?

Upvotes: 0

Views: 833

Answers (1)

Jorge A
Jorge A

Reputation: 57

The solution I found is this one, thanks to @Limey using dplyr:

df <- table %>%
inner_join(reftable,
 by =c("var1"="varr", "var2"="varr",...)

This way, I join the values of the look up table directly with the values of the patients. Afterwards you can mutate a new column directly with the values.

Upvotes: 0

Related Questions