Reputation: 1270
I have multiple columns, but here is only a part of my data:
df<-read.table (text=" Color1 Size1 Color2 Size2 Color3 Size3
Yellow AA Gray GB Purpul MO
Blue BD Cyne CE Gray GB
Yellow AA Yellow AA Black LL
Red MD Reddark KK Reddark KK
Green MC Reddark KK Green MC
", header=TRUE)
I want to bring down all the columns and show them as two columns and then remove duplicates to get this table:
Color Size
Yellow AA
Blue BD
Red MD
Green MC
Gray GB
Cyne CE
Reddark KK
Purpul MO
Black LL
I try Reshape2 using melt, but I struggled to do it.
Upvotes: 2
Views: 51
Reputation: 25895
With no other libraries, reshape and unique can get the job done:
> unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
Color Size
1.1 Yellow AA
2.1 Blue BD
4.1 Red MD
5.1 Green MC
1.2 Gray GB
2.2 Cyne CE
4.2 Reddark KK
1.3 Purpul MO
3.3 Black LL
Pivoting seems like overkill to me, but what do I know. If the index bothers you (though it saves the information on how the wide table was structured) then reset the row names:
> uniq = unique(reshape(df, varying=1:6, direction="long", v.names=c("Color", "Size"), timevar=NULL)[1:2])
> rownames(uniq) = NULL
Upvotes: 3
Reputation: 887048
We can use pivot_longer
from tidyr
to reshape from 'wide' to 'long' in two columns by specifying the names_sep
as the boundary between a letter and a digit ((?<=[a-z])(?=\\d)
) in the column names and then take the distinct
of the two columns
library(dplyr)
library(tidyr)
pivot_longer(df, cols = everything(),
names_to = c( '.value', 'grp'), names_sep="(?<=[a-z])(?=\\d)") %>%
distinct(Color, Size)
-output
# A tibble: 9 x 2
# Color Size
# <chr> <chr>
#1 Yellow AA
#2 Gray GB
#3 Purpul MO
#4 Blue BD
#5 Cyne CE
#6 Black LL
#7 Red MD
#8 Reddark KK
#9 Green MC
Or using data.table
library(data.table)
unique(melt(setDT(df), measure = patterns('^Color', '^Size'),
value.name = c('Color', 'Size'))[, variable := NULL])
# Color Size
#1: Yellow AA
#2: Blue BD
#3: Red MD
#4: Green MC
#5: Gray GB
#6: Cyne CE
#7: Reddark KK
#8: Purpul MO
#9: Black LL
df <- structure(list(Color1 = c("Yellow", "Blue", "Yellow", "Red",
"Green"), Size1 = c("AA", "BD", "AA", "MD", "MC"), Color2 = c("Gray",
"Cyne", "Yellow", "Reddark", "Reddark"), Size2 = c("GB", "CE",
"AA", "KK", "KK"), Color3 = c("Purpul", "Gray", "Black", "Reddark",
"Green"), Size3 = c("MO", "GB", "LL", "KK", "MC")),
class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1
Reputation: 39595
Another way of using pivot_longer()
and pivot_wider()
can be:
library(dplyr)
library(tidyr)
#Code
newdf <- df %>%
pivot_longer(everything()) %>%
mutate(name=substr(name,1,nchar(name)-1)) %>%
group_by(name) %>% mutate(id2=row_number()) %>%
pivot_wider(names_from = name,values_from=value) %>%
select(-id2) %>%
filter(!duplicated(paste(Color,Size)))
Output:
# A tibble: 9 x 2
Color Size
<fct> <fct>
1 Yellow AA
2 Gray GB
3 Purpul MO
4 Blue BD
5 Cyne CE
6 Black LL
7 Red MD
8 Reddark KK
9 Green MC
Upvotes: 1