Reputation: 43
I am trying to add sub-group labels
and order observations on the x-axis in my ggplot2. There are multiple questions about this on here already but the responses all recommend using faceting (e.g. here). My plot is already faceted, such that these responses don't work for me. I tried using reorder(x, by_this_variable)
but this only seems to work if by_this_variable is the y-axis. Why? If I try to reorder it by a different variable, I receive a warning:
argument is not numeric or boolean
To be more specific, I am plotting two points (percentages by participant obtained in two different tasks) for each discrete x-axis value (1 for each participant) with arrows connecting the dots per participant. This is to indicate whether participant behavior was influenced negatively or positively across tasks. My facets are 2 different (treatment) conditions that participants were randomly sorted into. I would now like to group these dot-arrow graph according to different participant origins (a possible predictor for different responses to the treatment) and add this information as a label on the x-axis, but all I can achieve right now is to have the values sorted alphabetically (the default).
This plot might end up looking too busy. If there is a better way to plot all of this information (relative change of behavior by task, by participant, by condition, by origin) in one graph, I am open for suggestions!
My code:
Data <- data.frame(c(28.5, 20, 55.4, 30.5, 66.6, 45.4, 43.2, 43.1, 28.5, 55.4, 30.5,
66.6, 45.4, 20), c("Participant 1", "Participant 1",
"Participant 2", "Participant 2", "Participant 3",
"Participant 3","Participant 4", "Participant 4","Participant 5",
"Participant 5", "Participant 6", "Participant 6", "Participant 7",
"Participant 7"),c("India", "India", "India", "India", "Algeria",
"Algeria", "Algeria", "Algeria", "India", "India", "India",
"India", "Algeria", "Algeria"),c("Treatment A", "Treatment A",
"Treatment B", "Treatment B","Treatment A", "Treatment A",
"Treatment B", "Treatment B", "Treatment A", "Treatment A",
"Treatment B", "Treatment B", "Treatment A", "Treatment A"),
c("Task 1", "Task 2", "Task 1", "Task 2", "Task 1", "Task 2",
"Task 1", "Task 2", "Task 1", "Task 2", "Task 1", "Task 2",
"Task 1", "Task 2"))
colnames(Data) <- c("Percentage", "Participant", "Origin", "Treatment", "Task")
ggplot(Data, aes(y=Percentage, x = Participant, group = Participant))+
geom_point(aes(color = Task))+
geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
facet_grid(~Treatment, scales = "free_x", space = "free_x")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
This produces the following plot:
Participants 1 & 5 are from India and 3 & 7 from Algeria, so I would like to group them together on the x-axis and add a label for origin.
EDIT:
The warning above seems to stem from the fact that Origin is a multi-level factor (and reorder appears to work only with numeric values), thus setting x = reorder(Participant, as.numeric(Origin)) will order the values according to Origin, but how can I add appropriate Origin labels below the plot?
Upvotes: 2
Views: 1739
Reputation: 6436
One suggestion is to use an ordered factor. For the levels of the factor concatenate Origin
and Participant
. For the labels of the factor, concatenate Participant
and Origin
.
# The unique values from the column 'Origin_Participant' will act as the levels
# of the factor. The order is imposed by 'Origin', so that participants from
# same country group together.
Data$Origin_Participant <- paste(Data$Origin, Data$Participant, sep = "\n")
# The unique values from 'Participant_Origin' column will be used for the
# factor' labels (what will end up on the plot).
Data$Participant_Origin <- paste(Data$Participant, Data$Origin, sep = "\n")
# Order data.frame by 'Origin_Participant'. Is also important so that the levels
# correspond to the labels of the factor when creating it below.
Data <- Data[order(Data$Origin_Participant),]
# Or in decreasing order if you need
# Data <- Data[order(Data$Origin_Participant, decreasing = TRUE),]
# Finally, create the needed factor.
Data$Origin_Participant <- factor(x = Data$Origin_Participant,
levels = unique(Data$Origin_Participant),
labels = unique(Data$Participant_Origin),
ordered = TRUE)
library(ggplot2)
# Reuse your code, but map the factor `Origin_Participant` into x. I think there
# is no need of a grouping factor. I also added vjust = 0.5 to align the labels
# on the vertical center.
ggplot(Data, aes(y=Percentage, x = Origin_Participant))+
geom_point(aes(color = Task))+
geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
facet_grid(~Treatment, scales = "free_x", space = "free_x")+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
If you do not care that Origin
appears first in the labels, then is few steps shorter:
Data$Origin_Participant <- factor(x = paste(Data$Origin, Data$Participant, sep = "\n"),
ordered = TRUE)
ggplot(Data, aes(y=Percentage, x = Origin_Participant))+
geom_point(aes(color = Task))+
geom_line(arrow = arrow(length=unit(0.30,"cm"), type = "closed"), size = .3)+
facet_grid(~Treatment, scales = "free_x", space = "free_x")+
theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.5))
Upvotes: 0