Jessica
Jessica

Reputation: 101

R - merge() returns NA´s in ALL columns although all.x=T

I am new here and have searched the forum for my problem, but did not find a solution. I have two data frames which I want to merge on a common key field.

          merge(x
               ,y
               ,by.x="a"
               ,by.y="b"
               ,all.x=T
               ,sort=F
               )

Since my x-Dataframe has more rows than my y-Dataframe, I want to keep all rows from x with NA for the column from y but with all values in the columns from x. This code is only giving me extra rows for the unmatched cases with NA in ALL columns (columns from x and y). I would be really grateful if someone could help me out? Where is my mistake?

Example:

a = data.frame(c(111,222,333,444),c(1,5,3,8))
b = data.frame(c(111,222),c(0.1,0.4))
colnames(a)=c("code","value")
colnames(b)=c("code","value")
c = merge(a
          ,b
          ,by="code"
          ,all.x=T)

In this example it is working properly. In my data I obtain NA in all columns in row 3&4.

I hope you can understand my lousy example?!

Thank you! Jessica ;)

Upvotes: 10

Views: 19558

Answers (2)

Patrick
Patrick

Reputation: 1027

So curiously if you re-run the exact same code from @nograpes but you have the same column name in x and y, then you get NAs like you had in your example. See below

x <- data.frame(val1=c(2,8,6,3), a=c('h','k','b','e'))
y <- data.frame(val2=c(4,1),     a=c('h','e'))
merge(x, y, by = 'a', all=TRUE)

Result from the console is

a  val1 val2
b    6   NA
e    3    1
h    2    4
k    8   NA

Upvotes: 0

nograpes
nograpes

Reputation: 18323

Just set all=TRUE.

# Create your data
x<-data.frame(val1=c(2,8,6,3),a=c('h','k','b','e'))
y<-data.frame(val2=c(4,1),b=c('h','e'))
# Outer join
merge(x,y,by.x='a',by.y='b',all=TRUE)
#   a val1 val2
# 1 b    6   NA
# 2 e    3    1
# 3 h    2    4
# 4 k    8   NA

Upvotes: 2

Related Questions