Fons MA
Fons MA

Reputation: 1282

Filling NA values in categorical variable with values above while maintaining other row values in R

EDIT:

sujmshyftw's answer works for the sample code below but it's worth pointing out that you need to use arrange before you can actually deploy fill effectively

ORIGINAL QUESTION

A snippet of some Indian assembly constituency (AC) election data with the relevant issue looks like this:

AC_elections <- structure(list(ST_NAME = c("Gujarat", "Gujarat", "Gujarat", "Gujarat", 
"Gujarat", "Gujarat", "Gujarat", "Gujarat", "Gujarat", "Gujarat", 
"Madhya Pradesh", "Madhya Pradesh", "Madhya Pradesh", "Madhya Pradesh"
), AC_NO = c(44, 45, 46, 47, 48, 159, 160, 161, 162, 163, 204, 
205, 206, 207), DIST_NAME = structure(c(1L, NA, NA, NA, NA, 3L, 
NA, NA, NA, NA, 2L, NA, NA, NA), .Label = c("AHMADABAD", "INDORE", 
"SURAT"), class = "factor"), UR_TYPE = structure(c(1L, NA, NA, 
NA, NA, 1L, NA, NA, NA, NA, 1L, NA, NA, NA), .Label = "Urban", class = "factor"), 
    YEAR = c(2012, 2012, 2012, 2012, 2012, 2012, 2012, 2012, 
    2012, 2012, 2013, 2013, 2013, 2013), AC_NAME = c("Ellisbridge", 
    "Naranpura", "Nikol", "Naroda", "Thakkarbapa Nagar", "Surat East", 
    "Surat North", "Varachha Road", "Karanj", "Limbayat", "Indore-1", 
    "Indore-2", "Indore-3", "Indore-4"), AC_TYPE = c("GEN", "GEN", 
    "GEN", "GEN", "GEN", "GEN", "GEN", "GEN", "GEN", "GEN", "GEN", 
    "GEN", "GEN", "GEN"), PARTYABBRE = c("BJP", "BJP", "BJP", 
    "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", "BJP", 
    "BJP", "BJP")), row.names = c(974L, 4131L, 4132L, 4133L, 
4134L, 1077L, 4143L, 4144L, 4145L, 4146L, 2002L, 4151L, 4152L, 
4153L), class = "data.frame")

The values in DIST_NAME and UR_TYPE that should replace NA values can be deduced from the AC_NO preceding those NA values. So, we could fix this with something like:

AC_elections %>% 
  mutate(
    DIST_NAME = case_when(
       ST_NAME == "Gujarat" & AC_NO > 44 & AC_NO < 49 ~ "Ahmadabad"
       ST_NAME == "Gujarat" & AC_NO > 160 & AC_NO < 164 ~ "Surat"
       ST_NAME == "Madhya Pradesh" & AC_NO > 204 & AC_NO < 208 ~ "Indore"
      ),
  UR_TYPE = case_when (

    <similar code to above>

      )
  )

But I suspect there's a much more efficient and elegant solution. I was wondering whether there is something like the na.fill function in zoo that would apply in this case. Note that the row numbers for the rows with NA do not follow the relevant AC_NO in the original dataset.

Thanks for any hints!

Upvotes: 0

Views: 38

Answers (1)

sumshyftw
sumshyftw

Reputation: 1131

Maybe the fill function?

AC_election = AC_elections %>% fill(DIST_NAME, UR_TYPE)

which gives you

            ST_NAME AC_NO DIST_NAME UR_TYPE YEAR           AC_NAME AC_TYPE PARTYABBRE
974         Gujarat    44 AHMADABAD   Urban 2012       Ellisbridge     GEN        BJP
4131        Gujarat    45 AHMADABAD   Urban 2012         Naranpura     GEN        BJP
4132        Gujarat    46 AHMADABAD   Urban 2012             Nikol     GEN        BJP
4133        Gujarat    47 AHMADABAD   Urban 2012            Naroda     GEN        BJP
4134        Gujarat    48 AHMADABAD   Urban 2012 Thakkarbapa Nagar     GEN        BJP
1077        Gujarat   159     SURAT   Urban 2012        Surat East     GEN        BJP
4143        Gujarat   160     SURAT   Urban 2012       Surat North     GEN        BJP
4144        Gujarat   161     SURAT   Urban 2012     Varachha Road     GEN        BJP
4145        Gujarat   162     SURAT   Urban 2012            Karanj     GEN        BJP
4146        Gujarat   163     SURAT   Urban 2012          Limbayat     GEN        BJP
2002 Madhya Pradesh   204    INDORE   Urban 2013          Indore-1     GEN        BJP
4151 Madhya Pradesh   205    INDORE   Urban 2013          Indore-2     GEN        BJP
4152 Madhya Pradesh   206    INDORE   Urban 2013          Indore-3     GEN        BJP
4153 Madhya Pradesh   207    INDORE   Urban 2013          Indore-4     GEN        BJP

Upvotes: 1

Related Questions