Reputation: 189
Is it possible to duplicate this Excel pivot chart with ggplot2
or have something relatively similar? I really like that the chart displays the prices and the year-to-year percent changes on the legend below.
Here's the dput()
output of my sample dataset:
price <- structure(list(selling_date = structure(c(17869, 17870, 17870,
17872, 17872, 17872, 17875, 17878, 17879, 17879, 17882, 17882,
17885, 17889, 17892, 17893, 17893, 17893, 17896, 18232, 18232,
18233, 18234, 18235, 18235, 18235, 18236, 18236, 18239, 18239,
18240, 18242, 18242, 18243, 18243, 18246, 18250, 18250, 18254,
18257, 18260, 18260, 18261, 18261, 18261, 18598, 18599, 18599,
18600, 18607, 18612, 18613, 18614, 18617, 18617, 18619, 18619,
18619, 18624, 18626, 18627, 18964, 18967, 18968, 18970, 18970,
18976, 18977, 18977, 18981, 18983, 18990), class = "Date"), selling_price = c(800000,
625000, 105000, 580000, 419000, 360000, 795000, 320000, 790000,
625000, 779450, 415000, 615000, 555000, 500000, 710000, 655000,
600000, 639950, 680000, 280000, 365000, 730000, 581000, 360000,
530000, 950000, 825000, 550000, 300000, 385000, 590000, 590000,
660000, 410000, 510000, 799999, 575000, 690000, 700000, 565000,
735000, 866250, 339000, 360000, 674900, 560000, 435000, 1485000,
840000, 525000, 900000, 886000, 647000, 775000, 648000, 650000,
349000, 675000, 1225000, 1260000, 585000, 1100000, 1400000, 960000,
695000, 1000000, 1050000, 630000, 751000, 1225000, 1930721),
id_num = c(1364645, 1330850, 1385154, 1337665, 1349409, 1360807,
1356805, 1389378, 1333399, 1286526, 1380444, 1371559, 1325930,
1376221, 1386441, 1334435, 1333581, 1372726, 1364979, 1508940,
1534939, 1526490, 1522480, 1504380, 1536453, 1538182, 1490032,
1533847, 1504948, 1529791, 1521259, 1537746, 1538750, 1538455,
1528390, 1526742, 1538271, 1512930, 1518796, 1517900, 1538821,
1511886, 1544288, 1511768, 1535256, 1680111, 1683767, 1675425,
1639303, 1680942, 1669765, 1683493, 1685799, 1637667, 1682404,
1690692, 1677272, 1651257, 1687505, 1674430, 1601173, 1798143,
1857616, 1850499, 1857149, 1840874, 1831562, 1843883, 1859947,
1858923, 1856840, 1854640), year = c(2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018,
2018, 2018, 2018, 2018, 2018, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019, 2019,
2019, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020, 2020,
2020, 2020, 2020, 2020, 2020, 2020, 2020, 2021, 2021, 2021,
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021)), row.names = c(NA,
-72L), class = c("tbl_df", "tbl", "data.frame"))
I'm only able to recreate a basic line plot with the same data in ggplot2
, which is in the chart below.
The RStudio library I have loaded for this is library(tidyverse)
. Getting average and median price is easy. Here's the code I'm using for that:
price_groups <- price %>%
group_by(year) %>%
summarize(avg_sp = mean(selling_price),
med_sp = median(selling_price)
Getting the percent change year-to-year is in the code below. Unless there is another way, I've found I must create an object of my grouped summaries first before I can use the lag()
or lead()
functions on my groups.
price_groups <- price_groups %>%
mutate(lag_avg_sp = lag(avg_sp),
avg_pct_change = (avg_sp - lag_avg_sp) / lag_avg_sp,
lag_med_sp = lag(med_sp),
med_pct_change = (med_sp - lag_med_sp) / lag_med_sp) %>%
select(-c(lag_avg_sp, lag_med_sp))
Here's the code I've used so far in ggplot2
:
options(scipen = 999)
ggplot(price_groups, aes(x = year)) +
geom_line(aes(y = avg_sp, color = "Avg SP"), size = 2) +
geom_line(aes(y = med_sp, color = "Median SP"), size = 2) +
scale_y_continuous(breaks = seq(0, 1100000, by = 250000),
limits = c(0, 1100000),
labels = scales::dollar_format()) +
scale_color_manual(values = c("Avg SP" = "steelblue", "Median SP" = "dark gray")) +
labs(x = "Year",
y = "Selling Price",
color = "Price Type",
title = "Annual Change in Price\n") +
theme(legend.position = "bottom",
plot.title = element_text(hjust = 0.5))
I don't know enough about ggplot2
or the tidyverse
functionality to duplicate this Excel chart or create something relatively similar. I'd like to display average and median price along with average and median percent changes in the legend on the bottom on the same plot. Is this possible?
I'd prefer not to display the numbers on the lines themselves (above or below) as I think it looks cluttered.
Also, the date in the Excel chart is in wide format rather than long. After creating the price_groups
object, I tried messing with pivot_wider()
to recreate the format, but I don't have enough experience with this feature to figure it out. Is there a way to make the variables avg_sp
, avg_pct_change
, med_sp
, and med_pct_change
row names with the years as column names and have the values be the prices and percentages?
If recreating the Excel pivot chart isn't possible, I'd be fine with an option using pivot_wider()
to create a table and make a separate image I could append to the graph.
Upvotes: 0
Views: 228
Reputation: 160417
Here's a shot. It's not perfect, and I don't know how to align the years in the plot with the columns in the table ...
Several packages needed for this:
library(ggplot2)
library(dplyr)
library(tidyr) # pivot_*
# library(gridExtra) # tableGrob
library(patchwork)
Data reshaping:
price_summ <- price %>%
mutate(year = format(selling_date, format = "%Y")) %>%
group_by(year) %>%
summarize(
"Avg SP" = mean(selling_price),
"Median SP" = median(selling_price),
.groups = "drop"
) %>%
arrange(year) %>%
mutate(
across(
`Avg SP`:`Median SP`,
~ c(NA, zoo::rollapplyr(., 2, FUN = function(z) diff(z) / z[1])),
.names = "{sub('SP', '% Change', .col)}"
)
) %>%
pivot_longer(-year, names_to = "Price Type", values_to = "Selling Price")
price_summ
# # A tibble: 16 x 3
# year `Price Type` `Selling Price`
# <chr> <chr> <dbl>
# 1 2018 Avg SP 573074.
# 2 2018 Median SP 615000
# 3 2018 Avg % Change NA
# 4 2018 Median % Change NA
# 5 2019 Avg SP 574086.
# 6 2019 Median SP 578000
# 7 2019 Avg % Change 0.00177
# 8 2019 Median % Change -0.0602
# 9 2020 Avg SP 783431.
# 10 2020 Median SP 674950
# 11 2020 Avg % Change 0.365
# 12 2020 Median % Change 0.168
# 13 2021 Avg SP 1029702.
# 14 2021 Median SP 1000000
# 15 2021 Avg % Change 0.314
# 16 2021 Median % Change 0.482
And two plot components (with a little more reshaping in gg2
):
gg1 <- dplyr::filter(price_summ, !grepl("%", `Price Type`)) %>%
ggplot(aes(as.integer(year), `Selling Price`)) +
geom_line(aes(group = `Price Type`, color = `Price Type`)) +
scale_y_continuous(labels = scales::dollar_format()) +
labs(x = NULL) +
theme(legend.position = "bottom")
gg2 <- price_summ %>%
mutate(
value2 = case_when(
is.na(`Selling Price`) ~ "",
grepl("%", `Price Type`) ~ scales::percent(`Selling Price`, accuracy = 0.01),
TRUE ~ scales::dollar(`Selling Price`)
)
) %>%
pivot_wider(`Price Type`, names_from = "year", values_from = "value2") %>%
arrange(grepl("%", `Price Type`), `Price Type`) %>%
rename(` ` = `Price Type`) %>%
gridExtra::tableGrob(., rows = NULL)
gg1 / gg2
Upvotes: 1