user330
user330

Reputation: 1270

Removing individual duplicate columns and staking them in R

A small sample of the columns are as follows:

df<-read.table (text=" Name1   id1 Nam2    id2 Name3   id3
'Hay, Gho'    1   'Nar,Sim' 2   'Tag,Kami'    19
'Ray, Com'   13  'Kam,Nao' 5   'Nar,kasi'    20
'Hay,Gho' 1   'Hal,Ghw' 4   'Nar,kasi'    20
'Bab,Sim' 11  'Hay,Gho' 3   'Kam,Nao' 24
'Tag,kami'    22  'Bab,Sim' 6   'Ray,com' 29
'Bab,sim' 11  'Nar,Sim' 2   'Tag,Kami'    19

", header=TRUE)

I wan to remove duplicates within each column and then stack them to get the following table:

Name    code    Id
Hayo,Gho 1  id1
Ray,com 13  id1
Bab,Sim 11  id1
Tag,kim 22  id1
Nar,Sim 2   1d2
Kam,Nao 5   1d2
Hal,Ghw 4   1d2
Hay,Gho 3   1d2
Bab,Sim 6   id2
Tag,Kimi 19 id3
Nar,Kasi 20 id3
Kam,Nao 24  id3
Ray,com 19  id3

Upvotes: 1

Views: 54

Answers (2)

akrun
akrun

Reputation: 887048

We can do this with pivot_longer in a more compact way

library(dplyr)
library(tidyr)
pivot_longer(df, cols = everything(), names_to = c('.value', "Id"), 
       names_sep = "(?<=[a-z])(?=[0-9])") %>%
  distinct

-output

Upvotes: 0

Duck
Duck

Reputation: 39595

Try this approach. It uses a join after reshaping data by id and name. Also, some cleaning tasks were needed as names had spaces and lower, upper cases. Here the code using dplyr and tidyr:

library(dplyr)
library(tidyr)
#Code
newdf <- df %>% dplyr::select(starts_with('Nam')) %>%
  mutate(id=row_number()) %>%
  pivot_longer(-id) %>%
  rename(Name=value) %>%
  group_by(id) %>%
  mutate(Var=paste0('Var',row_number())) %>%
  dplyr::select(-name) %>%
  left_join(
    df %>% dplyr::select(starts_with('id')) %>%
      mutate(id=row_number()) %>%
      pivot_longer(-id) %>%
      rename(Id=name) %>%
      group_by(id) %>%
      mutate(Var=paste0('Var',row_number()))
  ) %>% ungroup() %>% dplyr::select(-c(id,Var)) %>%
  arrange(Id) %>%
  mutate(Val=toupper(gsub(" ", "", Name, fixed = TRUE))) %>%
  group_by(Id) %>%
  filter(!duplicated(Val)) %>% dplyr::select(-Val) %>%
  mutate(Name=gsub(" ", "", Name, fixed = TRUE))

Output:

# A tibble: 13 x 3
# Groups:   Id [3]
   Name     Id    value
   <chr>    <chr> <int>
 1 Hay,Gho  id1       1
 2 Ray,Com  id1      13
 3 Bab,Sim  id1      11
 4 Tag,kami id1      22
 5 Nar,Sim  id2       2
 6 Kam,Nao  id2       5
 7 Hal,Ghw  id2       4
 8 Hay,Gho  id2       3
 9 Bab,Sim  id2       6
10 Tag,Kami id3      19
11 Nar,kasi id3      20
12 Kam,Nao  id3      24
13 Ray,com  id3      29

Upvotes: 1

Related Questions