Reputation: 47
I have got a nested datame with two ids per row looking like:
df <- data.frame(
sample = rep(paste0("s", 1:5), each = 5),
ID1 = paste0("id1.", 1:5),
ID2 = paste0("id2.", 1:5),
counts = rep(1:5, each = 5)) %>%
arrange(ID1) %>%
group_by(ID1, ID2) %>% nest
I would like to get a dataframe for further analyses with the first row giving samples and in the next columns giving counts (per ids) with merged ids as colnames:
df3 <- data.frame(
sample = paste0("s", 1:5),
"id1.1|id2.1" = 1:5,
"id1.2|id2.2" = 1:5,
"id1.3|id2.3" = 1:5,
"id1.4|id2.4" = 1:5,
"id1.5|id2.5" = 1:5)
I have already started formating:
df2 <- df %>%
mutate(sample = data %>% map(pull, sample)) %>%
mutate(counts = data %>% map(pull, counts))
However I am not sure what would be an elegant approach to continue.
Upvotes: 1
Views: 225
Reputation: 887148
We can create the 'ID' column by pasteing the 'ID1', 'ID2', and pivot to wider format with pivot_wider
library(dplyr)
library(purrr)
library(tidyr)
library(stringr)
df %>%
ungroup %>%
unnest %>%
transmute(ID = str_c(ID1, ID2, sep="_"), sample, counts) %>%
pivot_wider(names_from = ID, values_from = counts)
# A tibble: 5 x 6
# sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
# <chr> <int> <int> <int> <int> <int>
#1 s1 1 1 1 1 1
#2 s2 2 2 2 2 2
#3 s3 3 3 3 3 3
#4 s4 4 4 4 4 4
#5 s5 5 5 5 5 5
If there are multiple columns in the 'data' and wants to subset some columns
df %>%
ungroup %>%
mutate(data = map(data, select, c(sample, counts))) %>%
unnest %>%
transmute(ID = str_c(ID1, ID2, sep="_"), sample, counts) %>%
pivot_wider(names_from = ID, values_from = counts)
Or in a single line with dcast
dcast(setDT(unnest(df)), sample ~ paste(ID1, ID2, sep="_"), values.var = 'counts')
# sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
#1: s1 1 1 1 1 1
#2: s2 2 2 2 2 2
#3: s3 3 3 3 3 3
#4: s4 4 4 4 4 4
#5: s5 5 5 5 5 5
Upvotes: 1
Reputation: 35554
A tidyr
solution, which unnest the nested column and then pivot to wide format.
library(tidyr)
df %>%
unite(ID, ID1, ID2) %>%
unnest(data) %>%
pivot_wider(names_from = ID, values_from = counts)
# # A tibble: 5 x 6
# sample id1.1_id2.1 id1.2_id2.2 id1.3_id2.3 id1.4_id2.4 id1.5_id2.5
# <chr> <int> <int> <int> <int> <int>
# 1 s1 1 1 1 1 1
# 2 s2 2 2 2 2 2
# 3 s3 3 3 3 3 3
# 4 s4 4 4 4 4 4
# 5 s5 5 5 5 5 5
Or starting with your work
df %>%
ungroup() %>%
mutate(sample = data %>% map("sample"),
counts = data %>% map("counts"), .keep = "unused") %>%
unite(ID, ID1, ID2) %>%
unnest(-ID) %>%
pivot_wider(names_from = ID, values_from = counts)
Note that mutate(sample = data %>% map("sample")
is a shortcut of mutate(sample = data %>% map(pull, sample))
, which is a feature of map()
.
Upvotes: 1