john
john

Reputation: 63

R, extract columns as vectors from grouped data frame

Imagine you have this data frame

x <- c("a1", "a2", "a3", "a4", "a1", "a2", "a3", "a4")

y <- c("red", "yellow", "blue", "green", "black", "pink", "purple", 
"orange")

df <- data.frame(x, y, stringsAsFactors = FALSE)

I cannot think of a way, preferably using dplyr, to extract the y column after grouping the data frame. Essentially I'd like to know what colors are in a1, in a2, in a3, and in a4, and store those results as separate vectors, ideally in a list.

I could do

colors.in.a1 <- df %>% filter(x == "a1") %>% pull(y)

for each of a1, a2, a3, a4, but that would take awhile with my real data. I was hoping that pull() would behave like tally(), perhaps returning a list of vectors that are named based on the grouping variable, but it doesn't.

Upvotes: 2

Views: 1485

Answers (2)

acylam
acylam

Reputation: 18681

With Base R only (thanks to @thelatemail's comment):

split(df$y, df$x)

or we can use nest:

library(tidyverse)

df %>%
  group_by(x) %>%
  nest() %>%
  mutate(data = data %>% map(pull, y)) %>%
  pull(data) %>%
  setNames(unique(x))

Result:

$a1
[1] "red"   "black"

$a2
[1] "yellow" "pink"  

$a3
[1] "blue"   "purple"

$a4
[1] "green"  "orange"

Upvotes: 2

tyluRp
tyluRp

Reputation: 4768

Another solution using dplyr and purrr:

library(dplyr)
library(purrr)

df %>% 
  split(.$x) %>% 
  map(pull, y)
$a1
[1] "red"   "black"

$a2
[1] "yellow" "pink"  

$a3
[1] "blue"   "purple"

$a4
[1] "green"  "orange"

Data:

df <- structure(list(x = c("a1", "a2", "a3", "a4", "a1", "a2", "a3", 
"a4"), y = c("red", "yellow", "blue", "green", "black", "pink", 
"purple", "orange")), class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 2

Related Questions