goyalshubh
goyalshubh

Reputation: 17

How to add key-value pair like dictionary?

My data (total 8532 obs) look like this:

Prd_Id  Weight
DRA24   19.35
DRA24   NA
DRA24   NA
DRA24   19.35
DRA24   19.35
DRA59   8.27
DRA59   8.27
DRA59   8.27
DRA59   8.27
DRA59   NA
DRA59   NA

Basically the problem is that there are many pairs of Prd_id and weight and in them some of the Prd_id doesnt have weight mentioned for example I have shown in the data that first has but 2nd and 3rd doesn't so I know the value of weight and I just have to replace NA with it, all identical Prd_id will have same weight but in R there is no such thing like a dictionary so I am finding it difficult to fix this problem. I tried using for loop but it is taking a very long time, my code looks like this:

for(i in 1:nrow(bms)){
  for(j in 1:1555){
    if(spl$Prd_Id[j]==bms$Prd_Id[i]){
      bms$weight[i]=spl$weight[j]
    }
  }
}

bms is the whole data (8532 obs) and spl (1555 obs) is a subset of bms with uniques values of Prd_Id.

Upvotes: 1

Views: 934

Answers (3)

h3rm4n
h3rm4n

Reputation: 4187

No need for a left_join:

bms %>% 
  group_by(Prd_Id) %>% 
  mutate(Weight = Weight[!is.na(Weight)][1])

another way with first:

bms %>% 
  group_by(Prd_Id) %>% 
  mutate(Weight = first(Weight[!is.na(Weight)]))

The result:

# A tibble: 11 x 2
# Groups:   Prd_Id [2]
   Prd_Id Weight
    <chr>  <dbl>
 1  DRA24  19.35
 2  DRA24  19.35
 3  DRA24  19.35
 4  DRA24  19.35
 5  DRA24  19.35
 6  DRA59   8.27
 7  DRA59   8.27
 8  DRA59   8.27
 9  DRA59   8.27
10  DRA59   8.27
11  DRA59   8.27

Off course you can do this in vanilla R also:

transform(bms, Weight = ave(Weight, Prd_Id, FUN = function(x) x[!is.na(x)][1]))

The result is the same:

   Prd_Id Weight
1   DRA24  19.35
2   DRA24  19.35
3   DRA24  19.35
4   DRA24  19.35
5   DRA24  19.35
6   DRA59   8.27
7   DRA59   8.27
8   DRA59   8.27
9   DRA59   8.27
10  DRA59   8.27
11  DRA59   8.27

Upvotes: 1

Karsten W.
Karsten W.

Reputation: 18450

Here is a base R solution

# example data
bms <- data.frame(
  Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
  Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)

# create key-value pairs
spl <- unique(bms[!is.na(bms[,"Weight"]),])
spl <- setNames(spl[,"Weight"], spl[,"Prd_Id"])

# fill NAs
idx <- which(is.na(bms[,"Weight"]))
bms[idx,"Weight"] <- spl[bms[idx, "Prd_Id"]]

Upvotes: 0

67342343
67342343

Reputation: 816

As @r2evans suggested you could use a SQL-like join strategy, combined with dplyr's coalescethis would look something like this:

library(dplyr)

# create 'bms'.
bms <- data_frame(
  Prd_Id = c("DRA24", "DRA24", "DRA24", "DRA24", "DRA24", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59", "DRA59"),
  Weight = c(19.35, NA, NA, 19.35, 19.35, 8.27, 8.27, 8.27, 8.27, NA, NA)
)

# create 'spl'
spl <- bms %>% filter(!is.na(Weight)) %>% filter(!duplicated(Prd_Id))

# SQL-like join and coalesce strategy
res <- bms %>% 
  left_join(spl, by = "Prd_Id", suffix = c("_bms", "_spl")) %>% 
  mutate(Weight = coalesce(Weight_bms, Weight_spl)) %>%
  select(-Weight_bms, -Weight_spl)

Upvotes: 1

Related Questions