Reputation: 57
I have a dataset of patients with several numerical variables including age (in decimal), height, weight, gender, BMI, and triglycerides. I want to create new variables like talla_z, peso_z, trigliceridos_z, which are the z-scores for each variable.
The age values are in decimals, so they need to be converted to match values in my z-score lookup table, e.g., 12.48 should match 12.5, not 12 in the table.
Here's the format for the lookup table for weight (Peso) with median (P50) and standard deviation (DS):
PESO
Edad HP50 HDS MP50 MDS
10 36.05 7.32 36.11 6.26
...
17.5 69.25 10.1 58.16 8.3
Here's the format for the patient data:
Edad Peso1 IMC1 Trig1 Talla1
11.43 84 32 22 180
...
17.3 69 25 24 210
How can I create a function in R to automatically assign z-scores to each individual based on age and gender?
The solution I'm trying is this one.
You can create a function to calculate the z-score for each patient based on their age, gender, and the given variable. You can round the decimal age to the nearest value that exists in the z-score lookup table:
# z-score
calculate_z_score <- function(patients_data, p50_ds_table, variable_name) {
# new column
z_column <- numeric(nrow(patients_data))
# Iteration
for (i in seq_len(nrow(patients_data))) {
# rounding
rounded_age <- round(patients_data$Edad[i] * 2) / 2
# match rows
row_index <- which(p50_ds_table$Edad == rounded_age)
if (patients_data$Sexo[i] == "Hombres") {
p50 <- p50_ds_table[row_index, "HP50"]
ds <- p50_ds_table[row_index, "HDS"]
} else {
p50 <- p50_ds_table[row_index, "MP50"]
ds <- p50_ds_table[row_index, "MDS"]
}
pd <- patients_data[i, variable_name]
# z score calc
z_column[i] <- ( pd - p50) / ds - pd ]
}
# column in data.frame
patients_data[paste(variable_name, "_z", sep = "")] <- z_column
return(patients_data)
}
# example
patients_data <- calculate_z_score(patients_data, p50_ds_table, "Peso")
This function will iterate through the patient data and round the age to the nearest value in the p50_ds_table. It then calculates the z-score for the given variable based on the patient's age. But I think it cannot take into account the gender.
When I try it , it says Ops.data.frame(pd,p50) "-" only defined for equally sized dataframes.
When I extract the individual values, patients_data[1,"Peso"] - p50[1,"HP50"]
for example, it works.
I've found this solution but I cannot see how to apply it in my example
How can I make this work?
Upvotes: 0
Views: 833
Reputation: 57
The solution I found is this one, thanks to @Limey using dplyr
:
df <- table %>%
inner_join(reftable,
by =c("var1"="varr", "var2"="varr",...)
This way, I join the values of the look up table directly with the values of the patients. Afterwards you can mutate a new column directly with the values.
Upvotes: 0