marcod666
marcod666

Reputation: 1

Turn duplicate rows into columns in R

I have a data table that looks that mostly has no duplicates, but sometimes we have some duplicates and they look as such...

ID1 | ID 2 | Value 1  | Value 2 | Value 3

a1  |  a2  | Montreal | 652462  | 2021-08-04

a1  |  a2  | Toronto  | 753842  | 2021-08-03

I want to turn the entire table into

ID1 | ID2 | Value 1  | Value 2 | Value 3    | Value 4  | Value 5 | Value 6

a1  | a2  | Montreal | 652462  | 2021-08-04 | Toronto  | 753842  | 2021-08-03

What is the simplest way to do this?

Thanks

Upvotes: 0

Views: 144

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388862

You can create an index column for reshaping -

library(dplyr)
library(tidyr)

df %>%
  mutate(index = data.table::rowid(ID1, ID2)) %>%
  pivot_wider(names_from = index, values_from = starts_with('Value'))

#  ID1   ID2   Value1_1 Value1_2 Value2_1 Value2_2 Value3_1   Value3_2  
#  <chr> <chr> <chr>    <chr>       <dbl>    <dbl> <date>     <date>    
#1 a1    a2    Montreal Toronto    652462   753842 2021-08-04 2021-08-03

Upvotes: 0

Martin Gal
Martin Gal

Reputation: 16978

I'm still not convinced that this is a good way to transform your data. You could do this:

library(dplyr)
library(stringr)

new_df <- df %>% 
  group_by(ID1, ID2) %>% 
  add_count() %>% 
  mutate(m = row_number())

seq_n <- unique(new_df$n)
seq_m <- unique(new_df$m)[-1]

result <- new_df %>% filter(m == 1)

for (i in seq_n) {
  for (j in seq_m) {
    result <- result %>% 
      left_join(
        new_df %>% filter(n == i, m == j), 
        by = c("ID1", "ID2"), 
        suffix = c("", ".y")
        )
  }
}

result %>% 
  select(-matches("^(n|m).*")) %>% 
  rename_with(~ paste0("Value", 
                       3 * str_count(.x, "\\.y") + 
                         as.integer(str_extract(.x, "(?<=Value)\\d+"))),
              starts_with("Value"))

Based on a dataset like this

# A tibble: 3 x 5
  ID1   ID2   Value1   Value2 Value3    
  <chr> <chr> <chr>     <dbl> <date>    
1 a1    a2    Montreal 652462 2021-08-04
2 a1    a2    Toronto  753842 2021-08-03
3 a1    a2    Toronto  753842 2021-08-03

this returns

# A tibble: 1 x 11
# Groups:   ID1, ID2 [1]
  ID1   ID2   Value1   Value2 Value3     Value4  Value5 Value6     Value7  Value8 Value9    
  <chr> <chr> <chr>     <dbl> <date>     <chr>    <dbl> <date>     <chr>    <dbl> <date>    
1 a1    a2    Montreal 652462 2021-08-04 Toronto 753842 2021-08-03 Toronto 753842 2021-08-03

Data

df <- structure(list(ID1 = c("a1", "a1", "a1"), ID2 = c("a2", "a2", 
"a2"), Value1 = c("Montreal", "Toronto", "Toronto"), Value2 = c(652462, 
753842, 753842), Value3 = structure(c(18843, 18842, 18842), class = "Date")), class = c("spec_tbl_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L), spec = structure(list(
    cols = list(ID1 = structure(list(), class = c("collector_character", 
    "collector")), ID2 = structure(list(), class = c("collector_character", 
    "collector")), Value1 = structure(list(), class = c("collector_character", 
    "collector")), Value2 = structure(list(), class = c("collector_double", 
    "collector")), Value3 = structure(list(format = ""), class = c("collector_date", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
    "collector")), skip = 1L), class = "col_spec"))

Upvotes: 1

Related Questions