Reputation: 25
I have a dataframe that looks like this (but for every US county)
Countyname | Neighbour County | Neighbour State |
---|---|---|
Autauga County, AL | Chilton County | AL |
Autauga County, AL | Dallas County | AL |
Baldwin County, AL | Escambia County | FL |
Catron County, NM | Apache County | AZ |
For all counties, if the neighbouring county is in the same state I want to replace the value in the Neighbour State column with a missing value NA, and if it's in a different state I want to keep it unchanged. I.e. I want to end up with something like this:
Countyname | Neighbour County | Neighbour State |
---|---|---|
Autauga County, AL | Chilton County | NA |
Autauga County, AL | Dallas County | NA |
Baldwin County, AL | Escambia County | FL |
Catron County, NM | Apache County | AZ |
I was thinking of looping through each row and if the column 'Countyname' contains the entry in the column 'Neighbour State', replace the entry with NA (e.g. for the first row if 'Autauga County, AL' contains 'AL', replace 'AL' with NA). How would I go about this (or is there a more efficient way to do it as this feels clunky)?
Upvotes: 0
Views: 1799
Reputation: 9858
You can use str_detect()
and ifelse()
library(dplyr)
library(stringr)
df%>%
mutate(Neighbour_State=ifelse(str_detect(Countyname, Neighbour_State), NA, Neighbour_State))
Or, preferably, str_detect
and replace()
df%>%
mutate(Neighbour_State=replace(Neighbour_State, str_detect(Countyname, Neighbour_State), NA))
#Or with pipes:
df%>%
mutate(Neighbour_State=Neighbour_State%>%replace(., str_detect(Countyname, .), NA))
Countyname Neighbour_County Neighbour_State
1 Autauga County, AL Chilton County <NA>
2 Autauga County, AL Dallas County <NA>
3 Baldwin County, AL Escambia County FL
4 Catron County, NM Apache County AZ
Upvotes: 1
Reputation: 2906
You don't need loop for this. You can use dplyr
package instead:
data <- data %>%
mutate(`Neighbour State` = ifelse(substr(Countyname, nchar(Countyname)-1, nchar(Countyname))== `Neighbour State`, NA,`Neighbour State`))
Upvotes: 0
Reputation: 6206
library(stringr)
dat %>%
dplyr::mutate(
Neighbour.State = dplyr::if_else(
str_trim(unlist(str_split(dat$Countyname[1], ","))[2]) == Neighbour.State, "NA", Neighbour.State)
)
Countyname Neighbour.County Neighbour.State
1 Autauga County, AL Chilton County NA
2 Autauga County, AL Dallas County NA
3 Baldwin County, AL Escambia County FL
4 Catron County, NM Apache County AZ
Upvotes: 0
Reputation: 26218
You may also use regex
to find stateName from the county and use function na_if
to replace values where it matches with the neighbour_county. Like this
'.*\\,\\s'
will match everything upto a comma and a whitespace and replace it with ''
i.e. nothing.ifelse
na_if
replace
case_when
or any conditional function.df <- read.table(header = T, text = "Countyname Neighbour_County Neighbour_State
'Autauga County, AL' 'Chilton County' AL
'Autauga County, AL' 'Dallas County' AL
'Baldwin County, AL' 'Escambia County' FL
'Catron County, NM' 'Apache County' AZ")
library(dplyr, warn.conflicts = F)
df %>%
mutate(Neighbour_State = na_if(gsub('.*\\,\\s','', Countyname, perl = T), Neighbour_State))
#> Countyname Neighbour_County Neighbour_State
#> 1 Autauga County, AL Chilton County <NA>
#> 2 Autauga County, AL Dallas County <NA>
#> 3 Baldwin County, AL Escambia County AL
#> 4 Catron County, NM Apache County NM
Created on 2021-07-03 by the reprex package (v2.0.0)
Upvotes: 0