Reputation: 24655
I have a dataframe a
that a few cells' information is missing, I have gathered the missing data and created another dataframe b
afterwards.
Usually I fill in the missing data by the following code:
for (loop.b in (1:nrow(b)))
{a[a[,"uid"]==b[loop.b,"uid"],"var1"] <- b[loop.b,"var1"]
}
This works OK for me, but what if b
is having lots of rows? Then the explicit loop will make the process slow. Is there any more elegant way for doing this kind of "missing data replacement" work?
Thanks.
Upvotes: 1
Views: 2474
Reputation: 4339
This works:
# matches of a$uid in b$uid, NA if not match
ind = match(a$uid, b$uid)
# 'ind' are the index in b and NA, we remove the latter
a[!is.na(ind),"var1"] = b[ind[!is.na(ind)],"var1"]
Upvotes: 1
Reputation: 49033
Assuming the two following data frames are similar to what you describe :
R> a <- data.frame(uid=1:10,var1=c(1:3,NA,5:7,NA,9:10))
R> a
uid var1
1 1 1
2 2 2
3 3 3
4 4 NA
5 5 5
6 6 6
7 7 7
8 8 NA
9 9 9
10 10 10
R> b <- data.frame(uid=c(8,4),var1=c(74,82))
R> b
uid var1
1 8 74
2 4 82
Then you may use directly the following :
R> a[b$uid,"var1"] <- b$var1
Which gives :
R> a
uid var1
1 1 1
2 2 2
3 3 3
4 4 82
5 5 5
6 6 6
7 7 7
8 8 74
9 9 9
10 10 10
Upvotes: 1
Reputation: 29477
I think you want match
, but it's hard to guess at what your data are like.
## a's var1 has some missing values
a <- data.frame(var1 = c(1, NA, 4.5, NA, 6.5), uid = 5:1)
## b knows all about them
b <- data.frame(var1 = c(2.3, 8.9), uid = c(2, 4))
## find the indexes in a$uid that match b$uid
ind <- match(b$uid, a$uid)
## those indexes now can be filled directly with b$uid
a$var1[ind] <- b$var1
That will work even if the uids are not unique (though the name sort of suggests that they are).
Upvotes: 0