Andrew T
Andrew T

Reputation: 95

R: How to summarize data from multiple ordered factors into one variable

I have data representing the severity of patients' asthma symptoms under different conditions. The severity variables are ordered factors, all with the same levels (Mild < Moderate < Severe). Here is a simplified example:

# Create example data frame
df <- data.frame(
  ID = c(1:5),
  Daytime = c("Mild", "Severe", "Mild", "Moderate", "Moderate"), # severity of daytime symptoms
  Sleep = c("Moderate", NA, "Mild", "Mild", "Moderate"), # severity of nighttime symptoms
  Activity = c("Mild", "Moderate", "Mild", "Moderate", "Severe") # severity of symptoms during activity
  )

# Specify order of factor levels
df$Daytime <- ordered(
  df$Daytime,
  levels = c("Mild",
             "Moderate",
             "Severe")
  )
df$Sleep <- ordered(
  df$Sleep,
  levels = c("Mild",
             "Moderate",
             "Severe")
  )
df$Activity <- ordered(
  df$Activity,
  levels = c("Mild",
             "Moderate",
             "Severe")
)

df

The resulting data frame looks like this:

  ID  Daytime    Sleep Activity
1  1     Mild Moderate     Mild
2  2   Severe     <NA> Moderate
3  3     Mild     Mild     Mild
4  4 Moderate     Mild Moderate
5  5 Moderate Moderate   Severe

I'm trying to create an "overall severity" variable where a patient's overall severity = the most severe symptoms reported in any of the three categories (Daytime, Sleep, and Activity). That is, "overall" equals the highest level from "daytime," "sleep", and "activity." The result would look like this:

  ID  Daytime    Sleep Activity  Overall
1  1     Mild Moderate     Mild Moderate
2  2   Severe     <NA> Moderate   Severe
3  3     Mild     Mild     Mild     Mild
4  4 Moderate     Mild Moderate Moderate
5  5 Moderate Moderate   Severe   Severe

I'd like to do it without writing some big, clunky for loop, but I can't figure out how. I thought maybe I could do it with ave(), but doesn't seem to work on multiple variables at once:

> df$Overall <- ave(c(df$Daytime, df$Sleep, df$Activity),
+                 df$ID,
+                 FUN = function(i) max (i, na.rm=T)
+                 )
Error in `$<-.data.frame`(`*tmp*`, "Worst", value = c(2L, 3L, 1L, 2L,  : 
  replacement has 15 rows, data has 5

Is there an apply function that can do this?

Upvotes: 1

Views: 323

Answers (1)

meuleman
meuleman

Reputation: 378

One quick way of doing this would be:

df$Overall <- apply(df[,2:4], 1, max, na.rm=T)

Upvotes: 4

Related Questions