Reputation: 365
I have a dataframe that looks like this:
in.dat <- data.frame(ID = c("A1", "A1", "A1", "A1", "B1", "B1", "B1", "B1"),
DB = rep(c("bio", "bio", "func", "loc"), 2),
val = c("IPR1", "IPR2", "s43", "333-456",
"IPR7", "IPR8", "q87", "566-900"))
ID DB val
1 A1 bio IPR1
2 A1 bio IPR2
3 A1 func s43
4 A1 loc 333-456
5 B1 bio IPR7
6 B1 bio IPR8
7 B1 func q87
8 B1 loc 566-900
I want to turn "DB" into columns and take the string values and collapse by ";"
out.dat <- data.frame(ID = c("A1", "B1"),
bio = c("IPR1;IPR2", "IPR7;IPR8"),
func = c("s47", "q87"),
loc = c("333-456", "566-900"))
> out
ID bio func loc
1 A1 IPR1;IPR2 s47 333-456
2 B1 IPR7;IPR8 q87 566-900
I've played around with pivot_wider
and group
using dplyr
but not quite getting what I want, since a group can have multiple values per ID that I want to collapse into one cell (e.g., "IPR1;IPR2")
Any solution would be appreciated!
Upvotes: 1
Views: 848
Reputation: 16832
pivot_wider
in recent tidyr
versions takes an argument values_fn
for a function that aggregates values before reshaping. This lets you do your operation in one function call.
library(tidyr)
in.dat %>%
pivot_wider(names_from = DB, values_from = val,
values_fn = list(val = ~paste(., collapse = ";")))
#> # A tibble: 2 x 4
#> ID bio func loc
#> <fct> <chr> <chr> <chr>
#> 1 A1 IPR1;IPR2 s43 333-456
#> 2 B1 IPR7;IPR8 q87 566-900
Upvotes: 2
Reputation: 886948
We can also use spread
with str_c
library(dplyr)
library(tidyr)
library(stringr)
in.dat %>%
group_by(ID, DB) %>%
summarise(val = str_c(val, collapse=";")) %>%
spread(DB, val)
# A tibble: 2 x 4
# Groups: ID [2]
# ID bio func loc
# <fct> <chr> <chr> <chr>
#1 A1 IPR1;IPR2 s43 333-456
#2 B1 IPR7;IPR8 q87 566-900
Upvotes: 0
Reputation: 382
You can use dcast
to do this.
in.dat <- data.frame(ID = c("A1", "A1", "A1", "A1", "B1", "B1", "B1", "B1"),
DB = rep(c("bio", "bio", "func", "loc"), 2),
val = c("IPR1", "IPR2", "s43", "333-456",
"IPR7", "IPR8", "q87", "566-900"))
library(reshape2)
dcast(in.dat, ID ~ DB, paste0, collapse = ";")
# ID bio func loc
#1 A1 IPR1;IPR2 s43 333-456
#2 B1 IPR7;IPR8 q87 566-900
Upvotes: 0
Reputation: 388817
We can collapse val
by ID
and DB
and then use pivot_wider
.
library(dplyr)
in.dat %>%
group_by(ID, DB) %>%
summarise(val = paste0(val, collapse = ";")) %>%
tidyr::pivot_wider(names_from = DB, values_from = val)
# ID bio func loc
# <fct> <chr> <chr> <chr>
#1 A1 IPR1;IPR2 s43 333-456
#2 B1 IPR7;IPR8 q87 566-900
Upvotes: 0