Reputation: 391
My data is in a tall format. I'm interested in producing line graphs using ggplot for each region. However, I keep receiving errors that the aesthetics must either be length 1 or the same as the data.
date_q <- HPF$date[1:167]
CumulativeSubset_region1 <- HPF$BaseCumulative[1:167]
ggplot(HPF[1:167, ], aes(x = date_q, y= CumulativeSubset_region1)) +
geom_line()
ggplot(data = HPF, aes(x=date, y= BaseC)) + geom_line(na.rm = FALSE) + theme_light()
As you can see, the spikes are due to the fact that the date range is constant throughout all regions, but regional cumulatives are different.
#Rows 1-3 (Region 1 Sample):
dput(head(HPF[1:3, ]))
structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215,
20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110,
113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909,
0.00691832065162346)), .Names = c("region", "path", "date", "index_value",
"index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative",
"StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl",
"data.frame"))
#Rows 168:200 (Region 2 Sample):
dput(head(HPF[168:200, ]))
structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1,
1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215,
20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314,
1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA,
0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972,
0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828,
100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110,
111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839,
-0.00917785181610786, -0.00194849914020834, -0.0799443229370588
)), .Names = c("region", "path", "date", "index_value", "index",
"counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
Upvotes: 0
Views: 68
Reputation: 145965
You need to tell ggplot
to do the lines separately for each region. This will be implied if you use aesthetics like linetype
or color
(and you'll automatically get a nice legend telling you which line is which).
If you want the aesthetics of the region lines to be identical, you can use the group
aesthetic to let ggplot know which points should be connected.
Using your little bit of sample data:
ggplot(HPF, aes(x = date, y = BaseCumulative, group = factor(region))) +
geom_line()
As region
is a categorical variable, I'd recommend making it a factor
- this will work well if you use an aesthetic like color
or linetype
.
I'd also recommend that you look into using an actual Date
class - it will make your axis accurate so you don't have giant gaps between December and January.
HPF$date = as.Date(as.character(HPF$date), format = "%Y%M%d")
HPF$region = factor(HPF$region)
ggplot(HPF, aes(x = date, y= BaseCumulative, linetype = factor(region))) +
geom_line() +
theme_light()
Upvotes: 2
Reputation: 15072
You can just assign the colour
aesthetic to your region variable, if the region is made into a categorical variable with factor
. This is me interpreting your desired output as a single line for each region. I also would recommend fixing your date formatting to make a prettier plot, but that's not the question. Using region
and region2
as objects from your dput
:
library(tidyverse)
HPF <- bind_rows(region, region2) %>%
mutate(region = factor(region))
ggplot(data = HPF) +
geom_line(aes(x=date, y= BaseCumulative, colour = region), na.rm = FALSE) +
theme_light()
You can get the same effect by assigning region to other aesthetics too, like linetype
, and you can control the colours generated with different colour scales.
Upvotes: 0