Christopher Yee
Christopher Yee

Reputation: 545

Duplicate Columns when using the merge statement

When I try to merge some codes to the code descriptions I get 2 duplicate columns. I start out with this: Table Name: Test

ID     State
1      5
2      2
3      5

and want to merge it with this: Table Name: statecode

StateID State
5       Mass
2       NY

to make a table like this:

ID    State
1     Mass 
2     NY
3     Mass

However, I get a table like this:

ID    State    State
1     5        Mass
2     2        NY
3     5        Mass

I used the merge command like this:

test = merge(x = test, y = statecode, by.x = "State", by.y = "StateID", all.x = T)

Is there a better function other than merge to use in this case? Maybe one to just replace the state code with the state name?

Thank you very much for the help!

Upvotes: 2

Views: 84

Answers (2)

Pierre L
Pierre L

Reputation: 28461

Another way with base R:

Pmerge <- function(df1, df2) {
  res <- suppressWarnings(merge(df1, df2, by.x = "State", by.y = "Code", all.x = T)[,-1])
  newdf <- res[order(res$ID),]
  row.names(newdf) <- 1:nrow(newdf)
  newdf
}

Pmerge(Test, statecode)
  ID State
1  1  Mass
2  2    NY
3  3  Mass

Upvotes: 2

Carlos Cinelli
Carlos Cinelli

Reputation: 11617

You do have to say which column you want to drop, but you can express it concisely using dplyr, for example.

Generating sample data based on yours (but correcting the column names):

test <- read.table(text =
"ID    StateID
1      5
2      2
3      5", header = TRUE)

statecode <- read.table(text =
"
StateID       State
5          Mass
2          NY", header = TRUE)

Using dplyr:

library(dplyr)
test %>% left_join(statecode, by = "StateID") %>% select(-StateID)
  ID State
1  1  Mass
2  2    NY
3  3  Mass

Upvotes: 3

Related Questions