datanalyst
datanalyst

Reputation: 391

How to Subset Tall Data for Graphing

My data is in a tall format. I'm interested in producing line graphs using ggplot for each region. However, I keep receiving errors that the aesthetics must either be length 1 or the same as the data.

Hard-coded solution:

date_q <- HPF$date[1:167]
CumulativeSubset_region1 <- HPF$BaseCumulative[1:167]
ggplot(HPF[1:167, ], aes(x = date_q, y= CumulativeSubset_region1)) + 
  geom_line() 

Without hard-coding:

ggplot(data = HPF, aes(x=date, y= BaseC)) + geom_line(na.rm = FALSE) + theme_light()

As you can see, the spikes are due to the fact that the date range is constant throughout all regions, but regional cumulatives are different.

Data:

#Rows 1-3 (Region 1 Sample): 
dput(head(HPF[1:3, ]))
    structure(list(region = c(1, 1, 1), path = c(1, 1, 1), date = c(20140215, 
    20140515, 20140815), index_value = c(1, 1.033852765, 1.041697122
    ), index = 0:2, counter = 1:3, BaseQoQ = c(NA, 0.033852765, 0.00758749917354029
    ), BaseCumulative = c(100, 103.3852765, 104.1697122), StressCumulative = c(110, 
    113.3852765, 114.1697122), StressQoQ = c(NA, 0.0307752409090909, 
    0.00691832065162346)), .Names = c("region", "path", "date", "index_value", 
    "index", "counter", "BaseQoQ", "BaseCumulative", "StressCumulative", 
    "StressQoQ"), row.names = c(NA, -3L), class = c("tbl_df", "tbl", 
    "data.frame"))

#Rows 168:200 (Region 2 Sample):
dput(head(HPF[168:200, ]))
    structure(list(region = c(2, 2, 2, 2, 2, 2), path = c(1, 1, 1, 
    1, 1, 1), date = c(20140215, 20140515, 20140815, 20141115, 20150215, 
    20150515), index_value = c(1, 1.014162265, 1.01964828, 1.009372314, 
    1.007210703, 1.018695493), index = 0:5, counter = 1:6, BaseQoQ = c(NA, 
    0.014162265, 0.00540940556489744, -0.0100779515854232, -0.0021415398163972, 
    0.0114025694582001), BaseCumulative = c(100, 101.4162265, 101.964828, 
    100.9372314, 100.7210703, 101.8695493), StressCumulative = c(110, 
    111.4162265, 111.964828, 110.9372314, 110.7210703, 101.8695493
    ), StressQoQ = c(NA, 0.0128747863636363, 0.00492389230216839, 
    -0.00917785181610786, -0.00194849914020834, -0.0799443229370588
    )), .Names = c("region", "path", "date", "index_value", "index", 
    "counter", "BaseQoQ", "BaseCumulative", "StressCumulative", "StressQoQ"
    ), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
    ))

Upvotes: 0

Views: 68

Answers (2)

Gregor Thomas
Gregor Thomas

Reputation: 145965

You need to tell ggplot to do the lines separately for each region. This will be implied if you use aesthetics like linetype or color (and you'll automatically get a nice legend telling you which line is which).

If you want the aesthetics of the region lines to be identical, you can use the group aesthetic to let ggplot know which points should be connected.

Using your little bit of sample data:

ggplot(HPF, aes(x = date, y = BaseCumulative, group = factor(region))) + 
  geom_line() 

enter image description here

As region is a categorical variable, I'd recommend making it a factor - this will work well if you use an aesthetic like color or linetype.

I'd also recommend that you look into using an actual Date class - it will make your axis accurate so you don't have giant gaps between December and January.

HPF$date = as.Date(as.character(HPF$date), format = "%Y%M%d")
HPF$region = factor(HPF$region)
ggplot(HPF, aes(x = date, y= BaseCumulative, linetype = factor(region))) + 
  geom_line() +
  theme_light()

enter image description here

Upvotes: 2

Calum You
Calum You

Reputation: 15072

You can just assign the colour aesthetic to your region variable, if the region is made into a categorical variable with factor. This is me interpreting your desired output as a single line for each region. I also would recommend fixing your date formatting to make a prettier plot, but that's not the question. Using region and region2 as objects from your dput:

library(tidyverse)
HPF <- bind_rows(region, region2) %>%
  mutate(region = factor(region))

ggplot(data = HPF) +
  geom_line(aes(x=date, y= BaseCumulative, colour = region), na.rm = FALSE) +
  theme_light()

enter image description here

You can get the same effect by assigning region to other aesthetics too, like linetype, and you can control the colours generated with different colour scales.

Upvotes: 0

Related Questions