jakes
jakes

Reputation: 2095

How to extract into columns multiple components from a list within nested column

As in title, is there a way to extract multiple components from a list using purrr and dplyr? pluck works with single element, but not with few. Something like below:

my_data <- data.frame(group = c(sample(c('A', 'B', 'C'), 20, replace = TRUE)), x = runif(100, 0, 10), y = runif(100, 0, 10))
my_data %>% 
  group_by(group) %>% 
  nest() %>% 
  mutate(km_cluster = map(data, ~kmeans(.x, 3) %>% pluck(c('cluster', 'centers'))))

Then, I would like to add information about cluster number and cluster centroid the observation is assigned to straight into data. The desired output of a single element from data column would be something like below:

structure(list(x = c(7.73117371369153, 0.0510848499834538, 4.55259998561814, 
9.89025634946302, 2.37372878007591, 1.97317335521802), y = 
c(7.59347913088277, 8.7801841692999, 9.11954281385988, 3.90361216617748, 
2.92225106153637, 0.338000932242721), cluster = c(3L, 1L, 1L, 3L, 2L, 2L), 
x_center = c(7.99236144404858, 2.53133282822091, 2.53133282822091, 
7.99236144404858, 3.79731344497379, 3.79731344497379), y_center = 
c(6.60092391962694, 8.42530809265251,8.42530809265251, 6.60092391962694, 
2.02696633155403, 2.02696633155403)), .Names = c("x", "y", "cluster", 
"x_center", "y_center"), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

Upvotes: 2

Views: 88

Answers (1)

www
www

Reputation: 39174

We can use [ from the base R.

my_data %>% 
  group_by(group) %>% 
  nest() %>% 
  mutate(km_cluster = map(data, ~kmeans(.x, 3) %>% `[`(c("cluster", "centers"))))
# # A tibble: 3 x 3
#   group data              km_cluster
#   <fct> <list>            <list>    
# 1 B     <tibble [25 x 2]> <list [2]>
# 2 C     <tibble [60 x 2]> <list [2]>
# 3 A     <tibble [15 x 2]> <list [2]>

UPDATE

my_data2 <- my_data %>% 
  group_by(group) %>% 
  nest() %>% 
  mutate(km_cluster = map(data, ~kmeans(.x, 3))) %>%
  mutate(data = map2(data, km_cluster, ~.x %>% mutate(cluster = .y[["cluster"]])),
         data = map2(data, km_cluster, ~left_join(.x, 
                                                  .y %>%
                                                    pluck("centers") %>%
                                                    as_data_frame() %>%
                                                    rowid_to_column() %>%
                                                    rename(x_center = x, y_center = y), 
                                                  by = c("cluster" = "rowid"))))

Upvotes: 3

Related Questions