John P. S.
John P. S.

Reputation: 383

Iterations with Loops and Functions

I wanted to run a loop that read the value in a dataframe (data_rais), but I realised it might take days and I think is due to fact that I'm running a loop, and not a function. I tried several times to write a function that does the same as this loop, but I couldn't find a way to do so. My question is: Is it possible to transform this loop in a function? How?

   for(i in 1:nrow(data_rais)){
  if(is.na(data_rais$postal_code[i])){
    next()
  } else {
    data_rais$munic_name[i] = munics_code[row(munics_code)[which(munics_code$cods == data_rais$munic[i])], 1]
  }
}

munics_code looks like this:

  munics_code = tibble::tribble(
    ~municipio,~cods,
    'BELFORD ROXO', 261,
    'DUQUE DE CAXIAS', 250,
    'DUQUE DE CAXIAS', 251,
    'DUQUE DE CAXIAS', 252,
    'DUQUE DE CAXIAS', 253,
    'DUQUE DE CAXIAS', 254,
    'ITABORAÍ', 248,
    'ITAGUAÍ', 2380,
    'ITAGUAÍ', 2381,
    'ITAGUAÍ', 2382,
    'ITAGUAÍ', 2383,
    'ITAGUAÍ', 2384,
    'MAGÉ', 259,
    'MANGARATIBA',2386,
    'MANGARATIBA',2387,
    'MANGARATIBA',2388,
    'MARICÁ',249,
    'MESQUITA',2655)

And data_rais$postal_code is a column of a data_frame with numbers that may or may not start with the numbers in the cods column in munics_code. Something like...

data_rais = data.frame(postal_code = c(2049253, 2033069, 2293513, 2411920, 2284937, 2341811, 2008638, 
                                       2279827, NA, 2386135, 2441900, 2392889, 2332114, 2254610, 
                                       2114414, 2089509, 2351781, 2451466, 2111632, 2070417, 2079485, 
                                       2328146, 2200329, 2116103, NA, 2449114, 2231708, NA, 
                                       NA, 2194253),
                       munic_name = NA)

Note: I cannot delete the NAs, I don't want to lose them.

Upvotes: 0

Views: 67

Answers (2)

TobiO
TobiO

Reputation: 1381

I would suggest you use match

data_rais$munic_name = munic_code[[1]][match(data_rais$munic,munic_code$cods)]

to take care of entries when you already have a match in data_rais use the following:

data_rais$munic_name[!is.na(data_rais$postal_code)] = munic_code[[1]][match(data_rais$munic[!is.na(data_rais$postal_code)],munic_code$cods)]

Not sure if you need the second approach, but be careful with overriding original variables. If you're unsure add another variable and inspect the matching manually for a few entries.

Upvotes: 3

R. Schifini
R. Schifini

Reputation: 9313

If I interpreted your code correctly, you are trying to set the data_rais$munic_name column to the corresponding municipio. This could be done with a merge:

df = merge(x = data_rais, y = munics_code, by.x = "postal_code", by.y = "cods", all.x = TRUE)

By doing a left merge (all.x = T) you'll preserve the NAs in data_rais. Assign the merge to data_rais if you want to add this column to it.

Upvotes: 1

Related Questions