Reputation: 383
I wanted to run a loop that read the value in a dataframe (data_rais), but I realised it might take days and I think is due to fact that I'm running a loop, and not a function. I tried several times to write a function that does the same as this loop, but I couldn't find a way to do so. My question is: Is it possible to transform this loop in a function? How?
for(i in 1:nrow(data_rais)){
if(is.na(data_rais$postal_code[i])){
next()
} else {
data_rais$munic_name[i] = munics_code[row(munics_code)[which(munics_code$cods == data_rais$munic[i])], 1]
}
}
munics_code
looks like this:
munics_code = tibble::tribble(
~municipio,~cods,
'BELFORD ROXO', 261,
'DUQUE DE CAXIAS', 250,
'DUQUE DE CAXIAS', 251,
'DUQUE DE CAXIAS', 252,
'DUQUE DE CAXIAS', 253,
'DUQUE DE CAXIAS', 254,
'ITABORAÍ', 248,
'ITAGUAÍ', 2380,
'ITAGUAÍ', 2381,
'ITAGUAÍ', 2382,
'ITAGUAÍ', 2383,
'ITAGUAÍ', 2384,
'MAGÉ', 259,
'MANGARATIBA',2386,
'MANGARATIBA',2387,
'MANGARATIBA',2388,
'MARICÁ',249,
'MESQUITA',2655)
And data_rais$postal_code
is a column of a data_frame with numbers that may or may not start with the numbers in the cods column in munics_code
.
Something like...
data_rais = data.frame(postal_code = c(2049253, 2033069, 2293513, 2411920, 2284937, 2341811, 2008638,
2279827, NA, 2386135, 2441900, 2392889, 2332114, 2254610,
2114414, 2089509, 2351781, 2451466, 2111632, 2070417, 2079485,
2328146, 2200329, 2116103, NA, 2449114, 2231708, NA,
NA, 2194253),
munic_name = NA)
Note: I cannot delete the NAs, I don't want to lose them.
Upvotes: 0
Views: 67
Reputation: 1381
I would suggest you use match
data_rais$munic_name = munic_code[[1]][match(data_rais$munic,munic_code$cods)]
to take care of entries when you already have a match in data_rais
use the following:
data_rais$munic_name[!is.na(data_rais$postal_code)] = munic_code[[1]][match(data_rais$munic[!is.na(data_rais$postal_code)],munic_code$cods)]
Not sure if you need the second approach, but be careful with overriding original variables. If you're unsure add another variable and inspect the matching manually for a few entries.
Upvotes: 3
Reputation: 9313
If I interpreted your code correctly, you are trying to set the data_rais$munic_name
column to the corresponding municipio
. This could be done with a merge:
df = merge(x = data_rais, y = munics_code, by.x = "postal_code", by.y = "cods", all.x = TRUE)
By doing a left merge (all.x = T
) you'll preserve the NAs in data_rais
. Assign the merge to data_rais
if you want to add this column to it.
Upvotes: 1