Sara Liu
Sara Liu

Reputation: 13

R how to mutate a new column that has the same vector size

I am trying to create a new column that converts FIPS code to the state abbreviation using library(usmap), the problem is that the new column after using mutate does not match the matrix size. The new column only has 51 rows after using fips_info, but not 23570 rows of the original matrix.

Appreciate any help, thanks!

#defined function to get state abb
fips_function <- function(fips_code){
  return (fips_info(fips_code)$abbr)
}

atus_19_selected <- act_19 %>%
  mutate(state_abb = fips_function(GESTFIPS))

Error: Problem with `mutate()` input `state_abb`.
x Input `state_abb` can't be recycled to size 23570.
ℹ Input `state_abb` is `fips_function(GESTFIPS)`.
ℹ Input `state_abb` must be size 23570 or 1, not 51.


atus_19_selected
# A tibble: 23,570 x 8
   GESTFIPS GTCO  TUCASEID t150701 t150799 t150801 t150899 t159999
      <dbl> <chr>    <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1       40 000    2.02e13      60       0       0       0       0
 2       51 153    2.02e13      40       0       0       0     260

Upvotes: 1

Views: 127

Answers (2)

akrun
akrun

Reputation: 887183

The issue would be that some of the values are duplicates, so, it would return the error. An option is rowwise

library(usmap)
library(dplyr)
act_19 %>%
     rowwise %>%
     mutate(state_abb = fips_function(GESTFIPS)) %>%
     ungroup

-output

# A tibble: 3 x 2
#  GESTFIPS state_abb
#     <dbl> <chr>    
#1       40 OK       
#2       51 VA       
#3       40 OK       

Or another option is to run this on the distinct values of 'GESTFIPS' and then do a join

act_19 %>%
    distinct(GESTFIPS) %>%
    mutate(state_abb = fips_function(GESTFIPS)) %>%
    right_join(act_19)

-output

# A tibble: 3 x 2
#  GESTFIPS state_abb
#     <dbl> <chr>    
#1       40 OK       
#2       40 OK       
#3       51 VA    

The error can be reproduced with the duplicate values

act_19 %>%
   mutate(state_abb = fips_function(GESTFIPS))

Error: Problem with mutate() input state_abb. ✖ Input state_abb can't be recycled to size 3. ℹ Input state_abb is fips_function(GESTFIPS). ℹ Input state_abb must be size 3 or 1, not 2. Run rlang::last_error() to see where the error occurred.

This issue arises directly from subsetting

usmap:::get_fips_info
function (fips) 
{
    if (all(nchar(fips) == 2)) {
        df <- utils::read.csv(system.file("extdata", "state_fips.csv", 
            package = "usmap"), colClasses = rep("character", 
            3), stringsAsFactors = FALSE)
        result <- df[df$fips %in% fips, ]  # -> would subset only for unique fips
        
...

data

act_19 <- tibble(GESTFIPS = c(40, 51, 40))

Upvotes: 1

r2evans
r2evans

Reputation: 160447

When fips_info cannot match a FIPS code for some reason, it does not return anything for that entry, so you cannot guarantee a 1-to-1 input/output relationship.

Using a known-defect highlights this:

act_19 <- structure(list(GESTFIPS = c(40L, 99L), GTCO = c(0L, 153L), TUCASEID = c(2.02e+13, 2.02e+13), t150701 = c(60L, 40L), t150799 = c(0L, 0L), t150801 = c(0L, 0L), t150899 = c(0L, 0L), t159999 = c(0L, 260L)), class = "data.frame", row.names = c("1", "2"))

usmap::fips_info(act_19$GESTFIPS)
# Error in fips_info.numeric(act_19$GESTFIPS) : 
#   Invalid FIPS code(s), must be either 2 digit (states) or 5 digit (counties), but not both.

usmap::fips_info(as.character(act_19$GESTFIPS))
# Warning in get_fips_info(fips_) :
#   FIPS code(s) 99 not found, excluded from result.
#   abbr fips     full
# 1   OK   40 Oklahoma

I suggest an alternative method:

abbrs <- usmap::fips_info(as.character(unique(act_19$GESTFIPS)))
# Warning in get_fips_info(fips_) :
#   FIPS code(s) 99 not found, excluded from result.
abbrs
#   abbr fips     full
# 1   OK   40 Oklahoma

abbrs %>%
  transmute(state_abb = abbr, GESTFIPS = as.integer(fips)) %>%
  right_join(act_19, by = "GESTFIPS")
  state_abb GESTFIPS GTCO TUCASEID t150701 t150799 t150801 t150899 t159999
1        OK       40    0 2.02e+13      60       0       0       0       0
2      <NA>       99  153 2.02e+13      40       0       0       0     260

You may still have entries with incorrect state_abb, but at least you'll retain all of your previous data and won't get that error.

Upvotes: 1

Related Questions