Transform duplicate rows to columns

Question

I'm working on a database with hundreds of variables, however, as its origin is JSON, it's making me very difficult to organize it. For example, instead of the file bringing the information in the columns, it creates new lines. See the example.

df1 <- data_frame(ID = c(111,111,111,111,111,111,222,222,333),
                  NAME = c('JOHN','JOHN','MARY','MARY','JAMES','JAMES','WILL','WILL','MARK'),
                  ADRESS = c('NY','NY','NY','NY','ROMA','ROMA','LONDON','TOKYO',''),
                  COLOR = c('GREEN','GREEN','RED','RED','YELLOW','YELLOW','BLUE','BLUE','ORANGE'),
                  CAR = c('','','BMW','BMW','TRUCK','TRUCK','FORD','FORD','FERRARI'),
                  COUNTRY = c('USA','USA','USA','USA','USA','USA','USA','USA','USA'))

I would like to organize the file in a way that it is grouped by ID, as in the example below:

df2 <- data_frame(ID = c(111,222,333),
                  NAME1 = c('JOHN','WILL','MARK'),
                  NAME2 = c('MARY','',''),
                  NAME3 = c('JAMES','',''),
                  ADRESS1 = c('NY','LONDON',''),
                  ADRESS2 = c('NY','TOKYO',''),
                  ADRESS3 = c('ROMA','',''),
                  COLOR1 = c('GREEN','BLUE','ORANGE'),
                  COLOR2 = c('RED','',''),
                  COLOR3 = c('YELLOW','',''),
                  CAR1 = c('','FORD','FERRARI'),
                  CAR2 = c('BMW','',''),
                  CAR3 = c('TRUCK','',''),
                  COUNTRY = c('USA','USA','USA'))

However, note that the COUNTRY variable does not need to have numerous columns (COUNTRY1, COUNTRY2, COUNTRY3) as the results are repeated. In my original file, I will find numerous situations like this. How would I arrange the data evenly in df2?

akrun · Accepted Answer

An option is also with pivot_wider

library(dplyr)
library(tidyr)
library(data.table)
distinct(df1) %>% 
  mutate(rn = rowid(ID)) %>%
  pivot_wider(names_from = rn, values_from = NAME:CAR, 
    names_sep = "", values_fill = "") %>%
  select(-COUNTRY, COUNTRY)

-output

# A tibble: 3 × 14
     ID NAME1 NAME2  NAME3   ADRESS1  ADRESS2 ADRESS3 COLOR1 COLOR2 COLOR3   CAR1      CAR2   CAR3    COUNTRY
                                       
1   111 JOHN  "MARY" "JAMES" "NY"     "NY"    "ROMA"  GREEN  "RED"  "YELLOW" ""        "BMW"  "TRUCK" USA    
2   222 WILL  "WILL" ""      "LONDON" "TOKYO" ""      BLUE   "BLUE" ""       "FORD"    "FORD" ""      USA    
3   333 MARK  ""     ""      ""       ""      ""      ORANGE ""     ""       "FERRARI" ""     ""      USA

Transform duplicate rows to columns

Answers (2)

Related Questions