Reputation: 1
I have a data table that looks that mostly has no duplicates, but sometimes we have some duplicates and they look as such...
ID1 | ID 2 | Value 1 | Value 2 | Value 3
a1 | a2 | Montreal | 652462 | 2021-08-04
a1 | a2 | Toronto | 753842 | 2021-08-03
I want to turn the entire table into
ID1 | ID2 | Value 1 | Value 2 | Value 3 | Value 4 | Value 5 | Value 6
a1 | a2 | Montreal | 652462 | 2021-08-04 | Toronto | 753842 | 2021-08-03
What is the simplest way to do this?
Thanks
Upvotes: 0
Views: 144
Reputation: 388862
You can create an index column for reshaping -
library(dplyr)
library(tidyr)
df %>%
mutate(index = data.table::rowid(ID1, ID2)) %>%
pivot_wider(names_from = index, values_from = starts_with('Value'))
# ID1 ID2 Value1_1 Value1_2 Value2_1 Value2_2 Value3_1 Value3_2
# <chr> <chr> <chr> <chr> <dbl> <dbl> <date> <date>
#1 a1 a2 Montreal Toronto 652462 753842 2021-08-04 2021-08-03
Upvotes: 0
Reputation: 16978
I'm still not convinced that this is a good way to transform your data. You could do this:
library(dplyr)
library(stringr)
new_df <- df %>%
group_by(ID1, ID2) %>%
add_count() %>%
mutate(m = row_number())
seq_n <- unique(new_df$n)
seq_m <- unique(new_df$m)[-1]
result <- new_df %>% filter(m == 1)
for (i in seq_n) {
for (j in seq_m) {
result <- result %>%
left_join(
new_df %>% filter(n == i, m == j),
by = c("ID1", "ID2"),
suffix = c("", ".y")
)
}
}
result %>%
select(-matches("^(n|m).*")) %>%
rename_with(~ paste0("Value",
3 * str_count(.x, "\\.y") +
as.integer(str_extract(.x, "(?<=Value)\\d+"))),
starts_with("Value"))
Based on a dataset like this
# A tibble: 3 x 5
ID1 ID2 Value1 Value2 Value3
<chr> <chr> <chr> <dbl> <date>
1 a1 a2 Montreal 652462 2021-08-04
2 a1 a2 Toronto 753842 2021-08-03
3 a1 a2 Toronto 753842 2021-08-03
this returns
# A tibble: 1 x 11
# Groups: ID1, ID2 [1]
ID1 ID2 Value1 Value2 Value3 Value4 Value5 Value6 Value7 Value8 Value9
<chr> <chr> <chr> <dbl> <date> <chr> <dbl> <date> <chr> <dbl> <date>
1 a1 a2 Montreal 652462 2021-08-04 Toronto 753842 2021-08-03 Toronto 753842 2021-08-03
df <- structure(list(ID1 = c("a1", "a1", "a1"), ID2 = c("a2", "a2",
"a2"), Value1 = c("Montreal", "Toronto", "Toronto"), Value2 = c(652462,
753842, 753842), Value3 = structure(c(18843, 18842, 18842), class = "Date")), class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
cols = list(ID1 = structure(list(), class = c("collector_character",
"collector")), ID2 = structure(list(), class = c("collector_character",
"collector")), Value1 = structure(list(), class = c("collector_character",
"collector")), Value2 = structure(list(), class = c("collector_double",
"collector")), Value3 = structure(list(format = ""), class = c("collector_date",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Upvotes: 1