mickeyt500
mickeyt500

Reputation: 85

How to fill a missing value based on an association in R

I am trying to fill out the NA values based on a lookup in the item_code column. Basically if the item_code has a assigned section, I would like it to look at the item_code in that row and check if there is a section assigned to the code elsewhere in the data, if yes use that or else NA. This is a huge data set.

                   item_code            section
1                  50406737               556
2                  48147401               NA
3                  49762314               NA
4                  47860166               557
5                  48147401               557
6                  49762314               NA
7                  49762314               554
8                  50884988               554
9                  50856064               NA
10                 49762314               554
11                 50868629               556
12                 51041955               556
13                 50856064               NA
14                 48147401               NA
15                 50460172               557
16                 50856064               559
17                 47860166               557
18                 50459661               557

Upvotes: 0

Views: 66

Answers (1)

lbusett
lbusett

Reputation: 5932

This should do the trick (notice that I added and additional item_code in the table to add the case in which an item_code only has NA values in section, which was missing from your example data)

require(tidyverse)
df= read.table(text =
"item_code            section
1                  50406737               556
2                  48147401               NA
3                  49762314               NA
4                  47860166               557
5                  48147401               557
6                  49762314               NA
7                  49762314               554
8                  50884988               554
9                  50856064               NA
10                 49762314               554
11                 50868629               556
12                 51041955               556
13                 50856064               NA
14                 48147401               NA
15                 50460172               557
16                 50856064               559
17                 47860166               557
18                 50459661               557
19                 50459662               NA",
  header = TRUE
)

df2 <- df %>% 
  group_by(item_code) %>%
  mutate(section = max(section, na.rm = T)) %>% 
  distinct(section) %>%
  print()

Source: local data frame [11 x 2]
Groups: item_code [11]

   section item_code
     <int>     <int>
1      556  50406737
2      557  48147401
3      554  49762314
4      557  47860166
5      554  50884988
6      559  50856064
7      556  50868629
8      556  51041955
9      557  50460172
10     557  50459661
11      NA  50459662

Upvotes: 1

Related Questions