Reputation:
I have a dataframe with different percent values of prices variation for a bunch of products over some months. The dataframe is something like this:
DATA P10 P25 P50 P75 P90
1 2011-03-01 0.034638180 0.086482130 0.133986300 0.177072700 0.233044900
2 2011-04-01 -0.185378000 -0.112070500 -0.064632480 -0.027086950 0.036643230
3 2011-05-01 0.008258164 0.053702510 0.094340370 0.137678700 0.270847900
4 2011-06-01 -0.105608500 -0.072065040 -0.019818160 0.018149950 0.069389460
5 2011-07-01 -0.080303930 -0.040885830 -0.006315288 0.030778970 0.084747610
6 2011-08-01 0.001524279 0.052229100 0.075928880 0.126691500 0.167735600
7 2011-09-01 -0.097216090 -0.066777680 -0.040682890 -0.014226140 0.034411750
And the code i wrote to create my plot is:
ggplot()+
geom_line(data = dataPerc, aes(x = dataPerc$DATA, y =dataPerc$P10,color="P10"),size=1)+
geom_line(data = dataPerc, aes(x = dataPerc$DATA, y =dataPerc$P25,color='P25'),size=1)+
geom_line(data = dataPerc, aes(x = dataPerc$DATA, y =dataPerc$P50,color = "P50"),size = 1)+
geom_line(data = dataPerc, aes(x = dataPerc$DATA, y =dataPerc$P75,color= "P75"),size=1)+
geom_line(data = dataPerc, aes(x = dataPerc$DATA, y =dataPerc$P90,color="P90"),size=1)+
scale_x_date(date_labels="%b %y",date_breaks ="1 month")+
theme(axis.text.x = element_text(angle = 90))+
labs(color='Percentile')+
scale_y_continuous(labels = function(x) paste0(x*100, "%"))+
xlab("Moth/Year")+
ylab("% fat. ")
Basically, I want to create the same plot with a loop that substitute the sequence of geom_lines above. Thanks.
Upvotes: 0
Views: 802
Reputation: 591
Once your data is in wide format, the possibilities are endless. I simplified your code a bit and faceted the data by your original column names (e.g., p10, p25, p50, etc.). This allows you to plot a separate line within each category—independently. Now you can observe the trend from March to September within each facet. I organized the facets into one column. Only the individual month names span the x-axis since you only have data for one year. Feel free to adjust the col = ...
argument to find the proper presentation.
If faceting is not your style, then drop the call to facet_wrap()
altogether and try inserting col = factor(perc)
inside of aes()
. This will stack the lines on top of each other on one plot; you also get a nice legend for free. I will demonstrate both methods below.
# Here is how to avoid looping and layering on multiple geoms
library(tidyverse)
library(lubridate)
df <- tribble(
~date, ~p10, ~p25, ~p50, ~p75, ~p90,
"2011-03-01", 0.034638180, 0.086482130, 0.133986300, 0.177072700, 0.233044900,
"2011-04-01", -0.185378000, -0.112070500, -0.064632480, -0.027086950, 0.036643230,
"2011-05-01", 0.008258164, 0.053702510, 0.094340370, 0.137678700, 0.270847900,
"2011-06-01", -0.105608500, -0.072065040, -0.019818160, 0.018149950, 0.069389460,
"2011-07-01", -0.080303930, -0.040885830, -0.006315288, 0.030778970, 0.084747610,
"2011-08-01", 0.001524279, 0.052229100, 0.075928880, 0.126691500, 0.167735600,
"2011-09-01", -0.097216090, -0.066777680, -0.040682890, -0.014226140, 0.034411750)
# Some quick data preparation
long_df <- df %>%
mutate(date = ymd(date)) %>%
pivot_longer(-date, names_to = "perc", values_to = "p_scores")
# Here is a subset of the data frame in long format
# A tibble: 35 x 3
date perc p_scores
<date> <chr> <dbl>
1 2011-03-01 p10 0.0346
2 2011-03-01 p25 0.0865
3 2011-03-01 p50 0.134
4 2011-03-01 p75 0.177
5 2011-03-01 p90 0.233
6 2011-04-01 p10 -0.185
7 2011-04-01 p25 -0.112
8 2011-04-01 p50 -0.0646
9 2011-04-01 p75 -0.0271
10 2011-04-01 p90 0.0366
# … with 25 more rows
# Simplified code
ggplot(long_df, aes(x = date, y = p_scores)) +
geom_line(size = 1) +
scale_x_date("Month",
date_breaks = "1 month",
date_labels = '%B') +
scale_y_continuous("% Fat.", labels = function(x) paste0(x*100, "%")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
facet_wrap(~ perc, ncol = 1)
Below is my other recommendation if you want to stack the lines onto one plot. It appears each line moves in tandem over time without much volatility. I included the code as well for reproducibility.
ggplot(long_df, aes(x = date, y = p_scores, col = factor(perc))) +
geom_line(size = 1) +
scale_x_date("Month",
date_breaks = "1 month",
date_labels = '%B') +
scale_y_continuous("% Fat.", labels = function(x) paste0(x*100, "%")) +
labs(color = "Score \nType:") + # This is a generic legend title
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
Also, you can achieve the same result without converting perc
to a factor variable inside of aes()
. But I digress.
I hope this helps!
Upvotes: 0
Reputation: 1179
Here's an answer without the loop you said you wanted. ggplot2
does not work well with loops.
# Read in your data - changed `DATA` to `date`
a <-
"date P10 P25 P50 P75 P90
1 2011-03-01 0.034638180 0.086482130 0.133986300 0.177072700 0.233044900
2 2011-04-01 -0.185378000 -0.112070500 -0.064632480 -0.027086950 0.036643230
3 2011-05-01 0.008258164 0.053702510 0.094340370 0.137678700 0.270847900
4 2011-06-01 -0.105608500 -0.072065040 -0.019818160 0.018149950 0.069389460
5 2011-07-01 -0.080303930 -0.040885830 -0.006315288 0.030778970 0.084747610
6 2011-08-01 0.001524279 0.052229100 0.075928880 0.126691500 0.167735600
7 2011-09-01 -0.097216090 -0.066777680 -0.040682890 -0.014226140 0.034411750
"
df <- read.table(text = a, header = TRUE)
library(tidyr)
library(dplyr)
library(ggplot2)
# make the data tidy. ggplot2 needs tidy data (one observation per row)
df <- df %>% pivot_longer(cols = -date, names_to = "pct")
# format date as date
df$date <- as.Date(df$date)
ggplot(df, aes(x = date, y = value, color = pct)) +
geom_line(size=1) +
scale_x_date(date_labels="%b %y",date_breaks ="1 month") +
theme(axis.text.x = element_text(angle = 90)) +
labs(color='Percentile') +
scale_y_continuous(labels = function(x) paste0(x * 100, "%")) +
xlab("Month/Year") +
ylab("% fat. ")
```[![enter image description here][1]][1]
[1]: https://i.sstatic.net/OCk8O.png
Upvotes: 0
Reputation: 145755
Don't use a loop - convert your data from wide to long.
long_data = tidyr::pivot_longer(your_data, -DATA, names_to = "Percentile")
ggplot(long_data, aes(x = DATA, y = value, color = name)) +
geom_line(size = 1) +
theme(axis.text.x = element_text(angle = 90)) +
labs(x = "Month/Year", y = "% fat. ") +
scale_y_continuous(labels = scales::label_percent(accuracy = 0.1))
Also, don't use data$column
inside aes()
- it expects unquoted column names.
Upvotes: 1