Reputation: 63
I bring you a problem in which I have found myself when doing a job. I need to perform a 'drilldown' that contains around 100 columns and I am running into an efficiency issue. The data set contains the years from 1913 to 2021.
# Load packages
library(tidyverse)
library(lubridate)
library(highcharter)
# Example Data Frame
df <- data.frame(year = c('1913-01-01',
'1913-01-02',
'1913-01-03',
'1913-01-04',
'1913-01-05',
'1913-01-06',
'1913-01-07',
'1913-01-08',
'1913-01-09',
'1913-01-10',
'1913-01-11',
'1913-01-12',
'1914-01-01',
'1914-01-02',
'1914-01-03',
'1914-01-04',
'1914-01-05',
'1914-01-06',
'1914-01-07',
'1914-01-08',
'1914-01-09',
'1914-01-10',
'1914-01-11',
'1914-01-12'),
value = 1:12)
I need to create a drilldown where the average of the values in each year are shown, in this example they would be only 2 but in the original data frame there would be around 100 columns that should be shown in the graph
The way I can think of is the following:
df1913 <- df %>%
filter(year(year) == 1913) %>%
data.frame()
df1914 <- df %>%
filter(year(year) == 1914) %>%
data.frame()
dfDD <- tibble(name = year(df$year),
y = mean(df$value),
drilldown = substr(name, 1, 4))
I would have to repeat this code for each data frame I would need to create and I recognize that it is not efficient.
Example:
highchart() %>%
hc_chart(type = "column") %>%
hc_title(text = "Example Drilldown") %>%
hc_xAxis(type = "category") %>%
hc_legend(enabled = FALSE) %>%
hc_plotOptions(series = list(boderWidth = 2,
dataLabels = list(enabled = TRUE))) %>%
hc_add_series(data = dfDD,
name = "Mean",
colorByPoint = TRUE) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = list(list(id = "1913",
data = list_parse2(df1913)),
list(id = '1914',
data = list_parse2(df1914))))
UPDATE: May 07, 2022
EDIT:
It was necessary for me to reformulate my question since I have expressed myself in an incorrect way and that is why I did not get the answer to my main problem. First of all I thank everyone who has read and commented on his opinion about my problem.
Reformulated question:
I need to create a 'drilldown' style chart. My original dataset has only 2 columns:
For each year I have 12 values (1 for each month of the year). The problem is that the data set has the years from 1913 to 2021 registered in the 'years' column. It would be around 1300 rows. If I don't count wrong, it's 109 years where each year has 12 values and only the year 2021 has 7.
In the way that I had explained before, my way of solving the problem was inefficient and very long. Having to create a data frame for each subset of the year would be tedious and I would have more than 100 data frames created.
Is there a more efficient way to solve my problem!? If so, could you explain it to me with an example? Thank you very much and sorry for expressing myself poorly at first.
Upvotes: 0
Views: 125
Reputation:
Starting from your example dataframe df, process the data slightly to add variable year as character:
df <- df %>%
mutate(YMD = ymd(year),
year = as.character(year(YMD)))
... and do the grouping and aggregation during the composition of your plot (no need to cram your workspace with dataframelets):
library(dplyr)
## (preceding plot instructions omitted)
## ...
hc_add_series(data = df %>%
group_by(year) %>%
summarise(y = mean(value, na.rm = TRUE)) %>%
mutate(name = year, drilldown = year) #, ... other args
) %>%
hc_drilldown(allowPointDrilldown = TRUE,
series = list(
df %>%
group_by(year) %>%
summarise(drilldown_series =
list(id = first(year),
data = list_parse2(
df = .[c('year', 'value')] )
)
) %>%
pull(drilldown_series)
)
If your dataframe has a lot of columns, you might want to stack these, e.g with pivot_longer
, i. e. you narrow these columns down to two: one contains the variable name, the second its value:
library(tidyr)
df %>%
pivot_longer(cols = first_column_to_stack : last_col_to_stack,
names_to = "variable",
values_to = "value"
)
... now you can dynamically filter
the long-format dataframe by variable (and year and ...) before feeding it as plot arguments.
Upvotes: 1