aelhak
aelhak

Reputation: 415

Lock in factor level order when factor level is duplicated

I am trying to lock in the factor level order of a variable for the purpose of getting results to show in a plot in the same order as the dataframe.

datad$outcome <- factor(data$outcome, levels = unadjusted.combined$outcome)

Is there a way to fix the order of this when some of the rows duplicate: My rows duplicate as I have results for two different measures, which produces an error:

Error in `levels<-`(`*tmp*`, value = if (nl == nL) as.character(labels) else paste0(labels,  : factor level [54] is duplicated

My data look s something like this;

xvar   outcome
x1     wt_2
x1     wt_3
x1     wt_4
x1     bmi_2
x1     bmi_3  
x1     bmi_4
x2     wt_2
x2     wt_3
x2     wt_4
x2     bmi_2
x2     bmi_3  
x2     bmi_4

At the end, I produce my plot as follows:(currently results are in alphabetical order)

ggplot(data=data, aes(x=outcome, y=estimate, ymin=lci, ymax=uci, colour=xvar)) +  geom_pointrange() + geom_hline(yintercept=0, lty=2) + coord_flip()

Upvotes: 1

Views: 1026

Answers (1)

Dave2e
Dave2e

Reputation: 24079

Here is an attempt at an answer. From your question you would like the outcome to follow the order of the original data and not in the default alphabetical order.
Here is some example data:

data<-structure(list(xvar = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 
                      2L, 2L, 2L, 2L, 2L), .Label = c("x1", "x2"), class = "factor"), 
           outcome = structure(c(4L, 5L, 6L, 1L, 2L, 3L, 4L, 5L, 6L, 
                     1L, 2L, 3L), .Label = c("bmi_2", "bmi_3", "bmi_4", "wt_2", 
                      "wt_3", "wt_4"), class = "factor"), estimate = c(10.40, 
                       11.24, 10.09, 14.64, 10.48, 8.71, 13.27, 8.87, 9.97, 
                       13.12, 12.17, 8.44)), row.names = c(NA, 
                                          -12L), class = "data.frame")

Both the xvar and outcome are defaulted to factors. If we run the ggplot plot command:

ggplot(data=data, aes(x=outcome, y=estimate,  colour=xvar, ymin=0, ymax=15)) +  
  geom_pointrange() + geom_hline(yintercept=0, lty=2) #+ coord_flip()

The x-axis is in the alphabetical order:
enter image description here

Now to preserve the original order we can use the ordered=TRUE option in the factor function. Pass the correct order using levels=

#order the factors in the order they appear in the data frame
data$outcome <- factor(data$outcome, levels= unique(data$outcome), ordered=TRUE)

ggplot(data=data, aes(x=outcome, y=estimate,  colour=xvar, ymin=0, ymax=15)) +  
  geom_pointrange() + geom_hline(yintercept=0, lty=2) #+ coord_flip()

Now we maintained the correct order. enter image description here

This is where your question gets fuzzy. If you want to plot x1 then x2 with the outcomes in the same order, then you need to create a new variable by paste(data$xvar, data$outcome) and then use this new variable as the x-axis.

Hopes this answers your question.

Upvotes: 1

Related Questions