user6340762
user6340762

Reputation: 175

Use R to combine duplicate columns

I have a table with 4 columns with columns 1,3 and columns 2,4 representing the same variable.

Codes    Description        Codes     Description
xxxxx    describes xxxxx    zzzzz     describes zzzzz
yyyyy    describes yyyyy    12345     describes 12345

I want to convert the table into

Codes    Description       
xxxxx    describes xxxxx    
zzzzz    describes zzzzz
yyyyy    describes yyyyy  
12345    describes 12345

Right now I am doing this use by :

df_temp <- df[,3:4]
df <- df[, - c(3, 4)]
df <- rbind(df, df_temp)

But I have to do it for several table and my method doesn't seem very efficient. Would it be possible to do this using %>% or writing a function?

Upvotes: 3

Views: 3518

Answers (3)

Scott Ritchie
Scott Ritchie

Reputation: 10543

You can use the duplicated function to dynamically identify and rbind the duplicate columns:

combine_duplicates <- function(df) {
   duplicate_columns <- duplicated(colnames(df))
   return(rbind(df[,duplicate_columns], df[,!duplicate_columns]))
}
combine_duplicates(df)

This will work on any table, provided the duplicate columns are always in the same order as the original column names (e.g. c("Codes", "Description", "Codes", "Description") will work, c("Codes", "Description", "Description", "Codes") will not) and that there are no extra columns without duplicates in the table.

Upvotes: 2

alistaire
alistaire

Reputation: 43334

A list-based approach with purrr:

library(purrr)

df %>% map(as.character) %>%    # convert factors to character, leave as list
    split(names(.)) %>%    # split list by column names
    map_df(flatten_chr)   # unlist each element and reassemble to data.frame

## # A tibble: 4 × 2
##   Codes     Description
##   <chr>           <chr>
## 1 xxxxx describes xxxxx
## 2 yyyyy describes yyyyy
## 3 zzzzz describes zzzzz
## 4 12345 describes 12345

or in base R,

as.data.frame(lapply(split(lapply(df, 
                                  as.character), 
                           names(df)), 
                     unlist))

##   Codes     Description
## 1 xxxxx describes xxxxx
## 2 yyyyy describes yyyyy
## 3 zzzzz describes zzzzz
## 4 12345 describes 12345

Upvotes: 3

thelatemail
thelatemail

Reputation: 93813

A generalisable answer that will work for any number of pairs of variables in a somewhat sensible order (code,desc,code,desc or code,code,desc,desc) is:

names(dat) <- make.unique(names(dat)[c(1:2,1:length(dat))])[-(1:2)]
#[1] "Codes.1"       "Description.1" "Codes.2"       "Description.2"
reshape(dat, direction="long", varying=1:4, timevar=NULL)
#    Codes    Description id
#1.1 xxxxx describesxxxxx  1
#2.1 yyyyy describesyyyyy  2
#1.2 zzzzz describeszzzzz  1
#2.2 12345 describes12345  2

Upvotes: 2

Related Questions