Reputation: 17
My data (total 8532 obs) look like this:
Prd_Id Weight
DRA24 19.35
DRA24 NA
DRA24 NA
DRA24 19.35
DRA24 19.35
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 8.27
DRA59 NA
DRA59 NA
Basically the problem is that there are many pairs of Prd_id
and weight
and in them some of the Prd_id
doesnt have weight
mentioned for example I have shown in the data that first has but 2nd and 3rd doesn't so I know the value of weight
and I just have to replace NA with it, all identical Prd_id
will have same weight
but in R there is no such thing like a dictionary so I am finding it difficult to fix this problem. I tried using for loop
but it is taking a very long time, my code looks like this:
for(i in 1:nrow(bms)){
for(j in 1:1555){
if(spl$Prd_Id[j]==bms$Prd_Id[i]){
bms$weight[i]=spl$weight[j]
}
}
}
bms
is the whole data
(8532 obs) and spl
(1555 obs) is a subset of bms
with uniques values of Prd_Id
.
Upvotes: 1
Views: 934
Reputation: 4187
No need for a left_join
:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = Weight[!is.na(Weight)][1])
another way with first
:
bms %>%
group_by(Prd_Id) %>%
mutate(Weight = first(Weight[!is.na(Weight)]))
The result:
# A tibble: 11 x 2
# Groups: Prd_Id [2]
Prd_Id Weight
<chr> <dbl>
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27
Off course you can do this in vanilla R also:
transform(bms, Weight = ave(Weight, Prd_Id, FUN = function(x) x[!is.na(x)][1]))
The result is the same:
Prd_Id Weight
1 DRA24 19.35
2 DRA24 19.35
3 DRA24 19.35
4 DRA24 19.35
5 DRA24 19.35
6 DRA59 8.27
7 DRA59 8.27
8 DRA59 8.27
9 DRA59 8.27
10 DRA59 8.27
11 DRA59 8.27
Upvotes: 1
Reputation: 18450
Here is a base R solution
# example data
bms <- data.frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create key-value pairs
spl <- unique(bms[!is.na(bms[,"Weight"]),])
spl <- setNames(spl[,"Weight"], spl[,"Prd_Id"])
# fill NAs
idx <- which(is.na(bms[,"Weight"]))
bms[idx,"Weight"] <- spl[bms[idx, "Prd_Id"]]
Upvotes: 0
Reputation: 816
As @r2evans suggested you could use a SQL-like join strategy, combined with dplyr's coalesce
this would look something like this:
library(dplyr)
# create 'bms'.
bms <- data_frame(
Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)
# create 'spl'
spl <- bms %>% filter(!is.na(Weight)) %>% filter(!duplicated(Prd_Id))
# SQL-like join and coalesce strategy
res <- bms %>%
left_join(spl, by = "Prd_Id", suffix = c("_bms", "_spl")) %>%
mutate(Weight = coalesce(Weight_bms, Weight_spl)) %>%
select(-Weight_bms, -Weight_spl)
Upvotes: 1