Reputation: 13
I am plotting 4 lines using ggplot2, showing numeric values on the y-axis and months in a calendar year on the x-axis. I am using a data frame like this:
month | mean_ride_type | mean_ride_count |
---|---|---|
September | weekend_mem | 10218.25 |
September | weekday_mem | 40209.84 |
September | weekend_cas | 10399.10 |
September | weekday_cas | 24094.11 |
The month and mean_ride_type columns are character type; the mean_ride_count column is numeric type. The data repeats in the same way -- 4 mean_ride_type values per month -- for every month in a calendar year.
I can plot these lines using the following ggplot
code:
plot <- ggplot(df, aes(x = month, y = mean_ride_count, group = mean_ride_type))
plot <- plot + geom_line(aes(color = mean_ride_type), size = 1) + geom_point()
plot
And the plot looks fine.
The months are sorted alphabetically, but for my plot I want them to be sorted chronologically, beginning in May. So in my ggplot code I employ the scale_x_discrete()
function.
plot <- ggplot(df, aes(x = month, y = mean_ride_count, group = mean_ride_type)) + scale_x_discrete(limits = month_order)
plot <- plot + geom_line(aes(color = mean_ride_type), size = 1) + geom_point()
plot
..where month_order
is a vector with values of strings of months, beginning in May, like I want.
But when I run the above code, the plot only shows 4 points, when it should show 48. There are no lines, when before there were 4 lines. All the points are in the month of September, and I can't understand why. The months are in the order I want on the x-axis, which is cool. Also there are a couple of alerts:
- geom_path: Each group consists of only one observation. Do you need to adjust the group aesthetic?
- Removed 44 row(s) containing missing values (geom_path).
- Removed 44 rows containing missing values (geom_point).
I am at a loss. Especially since I used the scale_x_discrete()
function, with the same vector used for limits, on another plot with no problems whatsoever.
I couldn't find a similar problem on the web. And I've moved the scale_x_discrete()
function around within the code, but the result is the same.
Here are my data:
structure(list(month = c("September", "September", "September",
"September", "May ", "May ", "May ", "May ",
"June ", "June ", "June ", "June ", "April ",
"April ", "April ", "April ", "December ", "December ",
"December ", "December ", "July ", "July ", "July ",
"July ", "March ", "March ", "March ", "March ",
"January ", "January ", "January ", "January ", "August ",
"August ", "August ", "August ", "February ", "February ",
"February ", "February ", "October ", "October ", "October ",
"October ", "November ", "November ", "November ", "November "
), mean_ride_type = c("average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count",
"average_weekend_member_ride_count", "average_weekend_casual_ride_count",
"average_weekday_member_ride_count", "average_weekday_casual_ride_count"
), mean_ride_count = c(10218.25, 10566.5, 9980.38888888889, 5771.88888888889,
4022.86666666667, 3572.4, 3313.875, 2082.6875, 7184.83333333333,
7078.5, 6379.3125, 4361, 6739.15384615385, 5834.84615384615,
6648.23529411765, 3573.41176470588, 2974.41666666667, 1095.66666666667,
3463.15789473684, 891.157894736842, 9259.76923076923, 11316.7692307692,
8989.27777777778, 6787.66666666667, 4943.91666666667, 3934.08333333333,
4480.84210526316, 1938.10526315789, 2255.8, 647, 2805, 525.75,
11059.3571428571, 12257.3571428571, 10462.8823529412, 6944.58823529412,
1426.16666666667, 531.166666666667, 1398.5625, 234.8125, 7600.85714285714,
6012.57142857143, 8072.29411764706, 3578.58823529412, 5668.84615384615,
3860.07692307692, 5760.11764705882, 2230.47058823529)), row.names = c(NA,
-48L), class = c("tbl_df", "tbl", "data.frame"))
Upvotes: 1
Views: 623
Reputation: 18642
Trim the excess whitespace using the trimws
base function.
# turn month into a factor
df$month <- factor(trimws(df$month, "both"), month.name)
df <- subset(df, !month %in% month.name[1:4])
ggplot(df, aes(x = month,
y = mean_ride_count,
group = mean_ride_type)) +
geom_point() +
geom_line()
Your problem comes from the extra whitespace in your month
column:
head(df$month)
[1] "September" "September" "September" "September" "May " "May "
Which will not match the built-in month.name
vector:
month.name
[1] "January" "February" "March" "April" "May" "June" "July"
[8] "August" "September" "October" "November" "December"
Upvotes: 1