Gabriela Ramirez
Gabriela Ramirez

Reputation: 13

Duplicate rows except for certain values in R. Want to turn those values into their own columns

I have a dataset which contains duplicate rows. Except these rows have 5 columns which don't hold the same value which prevents me from using distinct(). I was wondering if there was a way I could turn

Column1 Column2 Column3 Column4 Column5 Column6 Column7 Column8
1         2        3        7     9       10      11      4
1         2        3        8     10      11      12      5

Into this

Column1 Column2 Column3 Column4_a Column4_b Column5_a Column5_b Column6_a Column6_b Column7_a Column7_b 
1         2        3        7        8          9        10        10       11         11       12  

              

Upvotes: 0

Views: 27

Answers (2)

akrun
akrun

Reputation: 887038

We could do this with pivot_wider after creating a sequence column

library(dplyr)
library(tidyr)
df %>% 
  mutate(rn = 1, rn2 = row_number()) %>%
  pivot_wider(names_from = rn2, values_from = starts_with('Column'))

Upvotes: 0

Duck
Duck

Reputation: 39595

This can be close to what you want. Reshape data to long and then group by the variable and omit the duplicated rows. After that you can create the names for the future columns and reshape to wide. Here the code using tidyverse functions:

library(tidyverse)
#Code
newdf <- df %>%
  pivot_longer(everything()) %>%
  arrange(name) %>%
  group_by(name) %>%
  filter(!duplicated(value)) %>%
  mutate(name=paste0(name,'_',row_number())) %>%
  pivot_wider(names_from = name,values_from=value)

Output:

# A tibble: 1 x 13
  Column1_1 Column2_1 Column3_1 Column4_1 Column4_2 Column5_1 Column5_2 Column6_1 Column6_2 Column7_1
      <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>     <int>
1         1         2         3         7         8         9        10        10        11        11
# ... with 3 more variables: Column7_2 <int>, Column8_1 <int>, Column8_2 <int>

Some data used:

#Data
df <- structure(list(Column1 = c(1L, 1L), Column2 = c(2L, 2L), Column3 = c(3L, 
3L), Column4 = 7:8, Column5 = 9:10, Column6 = 10:11, Column7 = 11:12, 
    Column8 = 4:5), class = "data.frame", row.names = c(NA, -2L
))

Upvotes: 1

Related Questions