BHN
BHN

Reputation: 47

Format nested data.frame grouped by ids using tidyverse

I have got a nested datame with two ids per row looking like:

df <- data.frame(
        sample = rep(paste0("s", 1:5), each = 5),
        ID1 = paste0("id1.", 1:5),
        ID2 = paste0("id2.", 1:5),
        counts = rep(1:5, each = 5)) %>%
    arrange(ID1) %>%
    group_by(ID1, ID2) %>% nest

I would like to get a dataframe for further analyses with the first row giving samples and in the next columns giving counts (per ids) with merged ids as colnames:

df3 <- data.frame(
    sample = paste0("s", 1:5),
    "id1.1|id2.1" = 1:5,
    "id1.2|id2.2" = 1:5,
    "id1.3|id2.3" = 1:5,
    "id1.4|id2.4" = 1:5,
    "id1.5|id2.5" = 1:5)

I have already started formating:

df2 <- df %>% 
    mutate(sample = data %>% map(pull, sample)) %>%
    mutate(counts = data %>% map(pull, counts))

However I am not sure what would be an elegant approach to continue.

Upvotes: 1

Views: 225

Answers (2)

akrun
akrun

Reputation: 887148

We can create the 'ID' column by pasteing the 'ID1', 'ID2', and pivot to wider format with pivot_wider

library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
df %>% 
     ungroup %>% 
     unnest %>% 
     transmute(ID = str_c(ID1, ID2, sep="_"), sample, counts) %>%
     pivot_wider(names_from = ID, values_from = counts)
# A tibble: 5 x 6
#  sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
#  <chr>        <int>       <int>       <int>       <int>       <int>
#1 s1               1           1           1           1           1
#2 s2               2           2           2           2           2
#3 s3               3           3           3           3           3
#4 s4               4           4           4           4           4
#5 s5               5           5           5           5           5

If there are multiple columns in the 'data' and wants to subset some columns

df %>%
   ungroup %>% 
   mutate(data = map(data, select, c(sample, counts))) %>%
   unnest %>% 
   transmute(ID = str_c(ID1, ID2, sep="_"), sample, counts) %>%
   pivot_wider(names_from = ID, values_from = counts)

Or in a single line with dcast

dcast(setDT(unnest(df)), sample ~ paste(ID1, ID2, sep="_"), values.var = 'counts')
#    sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
#1:     s1           1           1           1           1           1
#2:     s2           2           2           2           2           2
#3:     s3           3           3           3           3           3
#4:     s4           4           4           4           4           4
#5:     s5           5           5           5           5           5

Upvotes: 1

Darren Tsai
Darren Tsai

Reputation: 35554

A tidyr solution, which unnest the nested column and then pivot to wide format.

library(tidyr)

df %>%
  unite(ID, ID1, ID2) %>%
  unnest(data) %>%
  pivot_wider(names_from = ID, values_from = counts)

# # A tibble: 5 x 6
#   sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
#   <chr>        <int>       <int>       <int>       <int>       <int>
# 1 s1               1           1           1           1           1
# 2 s2               2           2           2           2           2
# 3 s3               3           3           3           3           3
# 4 s4               4           4           4           4           4
# 5 s5               5           5           5           5           5

Or starting with your work

df %>% 
  ungroup() %>% 
  mutate(sample = data %>% map("sample"),
         counts = data %>% map("counts"), .keep = "unused") %>% 
  unite(ID, ID1, ID2) %>%
  unnest(-ID) %>%
  pivot_wider(names_from = ID, values_from = counts)

Note that mutate(sample = data %>% map("sample") is a shortcut of mutate(sample = data %>% map(pull, sample)), which is a feature of map().

Upvotes: 1

Related Questions