ANieder
ANieder

Reputation: 223

highcharter hcaes "group" usage while plotting large amounts of data with highchart2()

I am trying to plot large datasets in scatterplots using highcharter package (> 50k rows of data), after some reading I found out that the highchart2() function includes the boost module from highcharts which should improve the performance a lot when plotting large amounts of data. Take the following example:

library(highcharter) # I'm using the latest version from github (0.5.0.9999)

x <- data.frame(a = rnorm(5000),
                b = rnorm(5000),
                cat = c(rep("Yes", 2500), rep("No",2500)))



highchart() %>%
  hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))

This should correctly create a scatterplot but with already some performance issues due to the amount of data. This is why I switched to highchart2() but to my surprise the plot does not show any data points when trying:

highchart2() %>%
  hc_add_series(data = x, type = "scatter", hcaes(x=a, y=b, group=cat))

And after some more searching and reading I found out that when using list_parse2() the plot is rendered much faster, so I tried this:

highchart2() %>%
  hc_add_series(data = list_parse2(x), type = "scatter", hcaes(x=a, y=b, group=cat))

And of course it doesnt work, because I changed the structure of the input data, and stripped the names of the variables I was giving to hcaes(). Then, when I tried this:

highchart2() %>%
  hc_add_series(data = list_parse2(x), type = "scatter")

I got a very fast rendered plot, BUT I cannot get the grouping working that will differentiate between "Yes" and "No" at each point, so all points are now the same color.

So my question would be, how can I efficiently plot large datasets with highcharter while keeping the ability to assign a variable to the "group" parameter in hcaes()?

Thanks in advance for your help.

Upvotes: 3

Views: 2238

Answers (1)

jbkunst
jbkunst

Reputation: 3029

A mini disclaimer: The hcaes work only if the data object is a data.frame.

Now, you can use dplyr, to get a data frame of series using the group_by function and then use the auxiliar function hc_add_series_listto add simultaniously more than one series.

library(highcharter)  # I'm using the latest version from github (0.5.0.9999)

x <- data.frame(a = rnorm(5000), b = rnorm(5000), cat = c(rep("Yes", 2500), 
  rep("No", 2500)))

library(dplyr)

xseries <- x %>% 
  # use `name` to name  series according the value of `cat` avariable
  group_by(name = cat) %>% 
  do(data = list_parse2(.)) %>%
  # add type of series
  mutate(type = "scatter")

# A data frame of series
xseries
#> Source: local data frame [2 x 3]
#> Groups: <by row>
#> 
#> # A tibble: 2 x 3
#>     name           data    type
#>   <fctr>         <list>   <chr>
#> 1     No <list [2,500]> scatter
#> 2    Yes <list [2,500]> scatter

And finally:

highchart2() %>% 
  hc_add_series_list(xseries)

hc_add_series_list

Upvotes: 4

Related Questions