user330
user330

Reputation: 1270

Staking multiple columns to two columns and removing duplicates in R

I have multiple columns, but here is only a part of my data:

df<-read.table (text=" Color1   Size1   Color2  Size2   Color3  Size3
Yellow  AA  Gray    GB  Purpul  MO
Blue    BD  Cyne    CE  Gray    GB
Yellow  AA  Yellow  AA  Black   LL
Red MD  Reddark KK  Reddark KK
Green   MC  Reddark KK  Green   MC
", header=TRUE)

I want to bring down all the columns and show them as two columns and then remove duplicates to get this table:

Color   Size
Yellow  AA
Blue    BD
Red MD
Green   MC
Gray    GB
Cyne    CE
Reddark KK
Purpul  MO
Black   LL

I try Reshape2 using melt, but I struggled to do it.

Upvotes: 2

Views: 51

Answers (3)

kabanus
kabanus

Reputation: 25895

With no other libraries, reshape and unique can get the job done:

> unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
      Color Size
1.1  Yellow   AA
2.1    Blue   BD
4.1     Red   MD
5.1   Green   MC
1.2    Gray   GB
2.2    Cyne   CE
4.2 Reddark   KK
1.3  Purpul   MO
3.3   Black   LL

Pivoting seems like overkill to me, but what do I know. If the index bothers you (though it saves the information on how the wide table was structured) then reset the row names:

> uniq = unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
> rownames(uniq) = NULL

Upvotes: 3

akrun
akrun

Reputation: 887048

We can use pivot_longer from tidyr to reshape from 'wide' to 'long' in two columns by specifying the names_sep as the boundary between a letter and a digit ((?<=[a-z])(?=\\d)) in the column names and then take the distinct of the two columns

library(dplyr)
library(tidyr)
pivot_longer(df, cols = everything(),
     names_to = c( '.value', 'grp'), names_sep="(?<=[a-z])(?=\\d)") %>% 
   distinct(Color, Size)

-output

# A tibble: 9 x 2
#  Color   Size 
#  <chr>   <chr>
#1 Yellow  AA   
#2 Gray    GB   
#3 Purpul  MO   
#4 Blue    BD   
#5 Cyne    CE   
#6 Black   LL   
#7 Red     MD   
#8 Reddark KK   
#9 Green   MC   

Or using data.table

library(data.table)
unique(melt(setDT(df), measure = patterns('^Color', '^Size'),
       value.name = c('Color', 'Size'))[, variable := NULL])
#     Color Size
#1:  Yellow   AA
#2:    Blue   BD
#3:     Red   MD
#4:   Green   MC
#5:    Gray   GB
#6:    Cyne   CE
#7: Reddark   KK
#8:  Purpul   MO
#9:   Black   LL

data

df <- structure(list(Color1 = c("Yellow", "Blue", "Yellow", "Red", 
"Green"), Size1 = c("AA", "BD", "AA", "MD", "MC"), Color2 = c("Gray", 
"Cyne", "Yellow", "Reddark", "Reddark"), Size2 = c("GB", "CE", 
"AA", "KK", "KK"), Color3 = c("Purpul", "Gray", "Black", "Reddark", 
"Green"), Size3 = c("MO", "GB", "LL", "KK", "MC")), 
class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 1

Duck
Duck

Reputation: 39595

Another way of using pivot_longer() and pivot_wider() can be:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>%
  pivot_longer(everything()) %>%
  mutate(name=substr(name,1,nchar(name)-1)) %>%
  group_by(name) %>% mutate(id2=row_number()) %>%
  pivot_wider(names_from = name,values_from=value) %>%
  select(-id2) %>%
  filter(!duplicated(paste(Color,Size)))

Output:

# A tibble: 9 x 2
  Color   Size 
  <fct>   <fct>
1 Yellow  AA   
2 Gray    GB   
3 Purpul  MO   
4 Blue    BD   
5 Cyne    CE   
6 Black   LL   
7 Red     MD   
8 Reddark KK   
9 Green   MC  

Upvotes: 1

Related Questions