Reputation: 790
I have a following dataframe:
df1 <- structure(list(group = c("KO", "WT", "KO", "KO", "KO", "KO",
"WT", "KO", "KO", "WT", "WT", "WT", "WT", "WT", "WT", "WT", "WT",
"WT", "WT", "KO", "KO"), name = c("rike", "rabe", "smake", "rike",
"rike", "rike", "rabe", "rike", "rike", "due", "rabe", "ene",
"ene", "due", "ene", "rabe", "due", "rabe", "due", "smake", "kum"
), type = c("C", "A", "A", "A", "C", "B", "A", "B", "B", "A",
"B", "A", "C", "C", "C", "C", "B", "C", "A", "C", "A"), posit = c(10,
2, 21, 5, 12, 22, 18, 19, 81, 22, 33, 31, 80, 40, 16, 16, 7,
9, 26, 27, 7)), row.names = c(NA, -21L), class = "data.frame")
I would like to combine 2 columns, one character ("type") and one numeric ("posit") in that manner, that all of categories (letters) would be joined with corresponding posits (numbers), for instance "A" and "37" as "A37" and all of the type-posit pairs for given "name" would be pasted in new column in ascending order (from lesser to bigger values). Also I'd like them to be separated with ":". The desired output is given below:
df2 <-structure(list(group = c("WT", "WT", "WT", "KO", "KO", "KO"),
name = c("ene", "due", "rabe", "kum", "rike", "smake"), type_posit = c("C16:A31:C80",
"B7:A22:A26:C40", "A2:C9:C16:A18:B33", "A7", "A5:C10:C12:B19:B22:B81",
"A21:C27")), class = "data.frame", row.names = c(NA, -6L))
I can achieve this by using set of dplyr functions, and creating intermediate dataframes, like this:
df2 <- df1 %>%
dplyr::mutate(t_p = paste0(type,posit)) %>%
dplyr::arrange(name,posit) %>%
dplyr::select(-type, -posit) %>%
dplyr::group_by(group, name) %>%
dplyr::summarise(tag_pos =paste0(t_p, collapse = ":"))
However I wonder, whether there is more efficient and/or cleaner way to do so? I would like to write a clean, understandable code.
Upvotes: 5
Views: 73
Reputation: 887601
Using data.table
library(data.table)
setDT(df1)[order(posit), .(type_posit = paste(type, sep = "",
posit, collapse = ":")),.(group, name)]
-output
group name type_posit
<char> <char> <char>
1: WT rabe A2:C9:C16:A18:B33
2: KO rike A5:C10:C12:B19:B22:B81
3: WT due B7:A22:A26:C40
4: KO kum A7
5: WT ene C16:A31:C80
6: KO smake A21:C27
Upvotes: 1
Reputation: 35594
You need to sort the rows by posit
and then summarise each group with paste0(..., collapse = ':')
.
library(dplyr)
df1 %>%
group_by(group, name) %>%
arrange(posit, .by_group = TRUE) %>%
summarise(type_posit = paste0(type, posit, collapse = ':'), .groups = 'drop')
# # A tibble: 6 × 3
# group name type_posit
# <chr> <chr> <chr>
# 1 KO kum A7
# 2 KO rike A5:C10:C12:B19:B22:B81
# 3 KO smake A21:C27
# 4 WT due B7:A22:A26:C40
# 5 WT ene C16:A31:C80
# 6 WT rabe A2:C9:C16:A18:B33
Upvotes: 3
Reputation: 73387
Using aggregate
basically.
aggregate(paste0(type, posit) ~ group + name, df1, paste, collapse=':')
# group name paste0(type, posit)
# 1 WT due A22:C40:B7:A26
# 2 WT ene A31:C80:C16
# 3 KO kum A7
# 4 WT rabe A2:A18:B33:C16:C9
# 5 KO rike C10:A5:C12:B22:B19:B81
# 6 KO smake A21:C27
Sorted version:
aggregate(paste0(type, posit) ~ group + name, df1, \(x) paste(sort(x), collapse=':'))
# group name paste0(type, posit)
# 1 WT due A22:A26:B7:C40
# 2 WT ene A31:C16:C80
# 3 KO kum A7
# 4 WT rabe A18:A2:B33:C16:C9
# 5 KO rike A5:B19:B22:B81:C10:C12
# 6 KO smake A21:C27
Upvotes: 2