fgootkind
fgootkind

Reputation: 97

Sunburst Chart from R in Plotly not showing all values in outermost ring

I'm trying to take cell spatial data and make a sunburst chart out of the slide data. Here's the basic format of the dataframe I'm using for it.

structure(list(slide = c("LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", 
                               "LU095", "LU095", "LU095", "LU095", "LU095", "LU095", "LU095"
), stroma_bins = structure(c(1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 
                             5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 
                             8L, 8L, 8L, 8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 
                             10L, 10L, 10L, 10L, 10L, 10L, 10L), levels = c("0-10% Stroma", 
                                                                            "10-20% Stroma", "20-30% Stroma", "30-40% Stroma", "40-50% Stroma", 
                                                                            "50-60% Stroma", "60-70% Stroma", "70-80% Stroma", "80-90% Stroma", 
                                                                            "90-100% Stroma"), class = "factor"), cd8_percent_bins = structure(c(1L, 
                                                                                                                                                 1L, 3L, 1L, 2L, 1L, 2L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 
                                                                                                                                                 3L, 4L, 5L, 6L, 7L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 
                                                                                                                                                 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L), levels = c("0-2% CD8+ Cells", 
                                                                                                                                                                                                                 "2-4% CD8+ Cells", "4-6% CD8+ Cells", "6-8% CD8+ Cells", "8-10% CD8+ Cells", 
                                                                                                                                                                                                                 "10-15% CD8+ Cells", "15-20% CD8+ Cells", ">20% CD8+ Cells"), class = "factor"), 
Freq = c(8L, 5L, 1L, 7L, 1L, 7L, 2L, 15L, 4L, 4L, 2L, 15L, 
         4L, 3L, 2L, 12L, 15L, 1L, 4L, 2L, 1L, 1L, 16L, 12L, 8L, 8L, 
         4L, 1L, 3L, 1L, 14L, 4L, 17L, 6L, 9L, 11L, 5L, 2L, 51L, 18L, 
         24L, 24L, 17L, 32L, 21L, 11L)), row.names = c(NA, -46L), class = c("data.table", 
                                                                            "data.frame"))

I'm using Plotly in R, but for some reason it's only displaying the outermost layer of the Sunburst chart for one region.

Faulty Sunburst Plot

Here's the code I have for it so far.

fig <- plot_ly(
labels = df2$labels,
parents = df2$parents,
values = df2$values,
type = 'sunburst',
branchvalues = 'total')

fig

Upvotes: 1

Views: 632

Answers (1)

Kat
Kat

Reputation: 18734

What you did to aggregate your data to create the data set in your plot isn't in your question. However, I see that you have 3 levels, and you didn't use the argument ids. You don't have unique children, for Plotly to interpret, either.

Starting with the data from the dput output.

For the root or top-level

All of the data in slide is the same, but I wrote it this way to make it more dynamic. This returns one row because there is one unique value in the highest level.

d1 <- df2 %>% group_by(slide) %>% 
  summarise(values = sum(Freq)) %>% 
  mutate(ids = slide, parents = "") %>%
  rename(labels = slide) %>% 
  select(ids, parents, labels, values) # all frame same order
# # A tibble: 1 × 4
#   ids   labels values parents
#   <chr> <chr>   <int> <chr>  
# 1 LU095 LU095     435 ""  

The next level, mid-level or first-child level

I'll take the same exact approach, but instead of leading with slide, I'll lead with stroma_bins. Additionally, the ids will contain the parent and current level.

d2 <- df2 %>% group_by(stroma_bins) %>% 
  summarise(values = sum(Freq)) %>% 
  mutate(ids = paste0(stroma_bins, " - ", unique(df2$slide)),
         parents = unique(df2$slide)) %>% 
  rename(labels = stroma_bins) %>% 
  select(ids, parents, labels, values)
# # A tibble: 10 × 4
#    ids                    parents labels         values
#    <chr>                  <chr>   <fct>           <int>
#  1 0-10% Stroma - LU095   LU095   0-10% Stroma        8
#  2 10-20% Stroma - LU095  LU095   10-20% Stroma       6
#  3 20-30% Stroma - LU095  LU095   20-30% Stroma       8
#  4 30-40% Stroma - LU095  LU095   30-40% Stroma       9
#  5 40-50% Stroma - LU095  LU095   40-50% Stroma      25
#  6 50-60% Stroma - LU095  LU095   50-60% Stroma      24
#  7 60-70% Stroma - LU095  LU095   60-70% Stroma      36
#  8 70-80% Stroma - LU095  LU095   70-80% Stroma      53
#  9 80-90% Stroma - LU095  LU095   80-90% Stroma      68
# 10 90-100% Stroma - LU095 LU095   90-100% Stroma    198 

The next level has two parents, therefore both parents will be included. It follows the same premise as the last two, but in the parents column, we need to combine the parents. (I only included a sample of what this frame looks like.)

d3 <- df2 %>% 
  rename(labels = cd8_percent_bins,
         values = Freq) %>% 
  mutate(ids = paste0(labels, " - ", stroma_bins),
         parents = paste0(stroma_bins, " - ", unique(df2$slide))) %>% 
  select(ids, parents, labels, values)
#                                    ids                parents            labels values
#  1:     0-2% CD8+ Cells - 0-10% Stroma   0-10% Stroma - LU095   0-2% CD8+ Cells      8
#  2:    0-2% CD8+ Cells - 10-20% Stroma  10-20% Stroma - LU095   0-2% CD8+ Cells      5
#  3:    4-6% CD8+ Cells - 10-20% Stroma  10-20% Stroma - LU095   4-6% CD8+ Cells      1
#  4:    0-2% CD8+ Cells - 20-30% Stroma  20-30% Stroma - LU095   0-2% CD8+ Cells      7
#  5:    2-4% CD8+ Cells - 20-30% Stroma  20-30% Stroma - LU095   2-4% CD8+ Cells      1
#  6:    0-2% CD8+ Cells - 30-40% Stroma  30-40% Stroma - LU095   0-2% CD8+ Cells      7
#  7:    2-4% CD8+ Cells - 30-40% Stroma  30-40% Stroma - LU095   2-4% CD8+ Cells      2
#  8:    0-2% CD8+ Cells - 40-50% Stroma  40-50% Stroma - LU095   0-2% CD8+ Cells     15
#  9:    2-4% CD8+ Cells - 40-50% Stroma  40-50% Stroma - LU095   2-4% CD8+ Cells      4
# 10:    4-6% CD8+ Cells - 40-50% Stroma  40-50% Stroma - LU095   4-6% CD8+ Cells      4

Next, combine these three data frames into one data frame.

dd <- do.call(rbind, list(d1, d2, d3))

Now the data is ready.

plot_ly(dd, parents = ~parents, labels = ~labels, values = ~values,
        ids = ~ids, branchvalues = "total", type = "sunburst")

enter image description here

Upvotes: 2

Related Questions