I have a dataframe that looks like this (but for every US county) Countyname Neighbour County Neighbour State Autauga County, AL Chilton County AL Autauga County, AL Dallas County AL Baldwin County, AL Escambia County FL Catron County, NM Apache County AZ For all counties, if the neighbouring county is in the same state I want to replace the value in the Neighbour State column with a missing value NA, and if it's in a different state I want to keep it unchanged. I.e. I want to end up with something like this: Countyname Neighbour County Neighbour State Autauga County, AL Chilton County NA Autauga County, AL Dallas County NA Baldwin County, AL Escambia County FL Catron County, NM Apache County AZ I was thinking of looping through each row and if the column 'Countyname' contains the entry in the column 'Neighbour State', replace the entry with NA (e.g. for the first row if 'Autauga County, AL' contains 'AL', replace 'AL' with NA). How would I go about this (or is there a more efficient way to do it as this feels clunky)?

Reputation: 25

Replace strings in one column which match another column

I have a dataframe that looks like this (but for every US county)

Countyname	Neighbour County	Neighbour State
Autauga County, AL	Chilton County	AL
Autauga County, AL	Dallas County	AL
Baldwin County, AL	Escambia County	FL
Catron County, NM	Apache County	AZ

For all counties, if the neighbouring county is in the same state I want to replace the value in the Neighbour State column with a missing value NA, and if it's in a different state I want to keep it unchanged. I.e. I want to end up with something like this:

Countyname	Neighbour County	Neighbour State
Autauga County, AL	Chilton County	NA
Autauga County, AL	Dallas County	NA
Baldwin County, AL	Escambia County	FL
Catron County, NM	Apache County	AZ

I was thinking of looping through each row and if the column 'Countyname' contains the entry in the column 'Neighbour State', replace the entry with NA (e.g. for the first row if 'Autauga County, AL' contains 'AL', replace 'AL' with NA). How would I go about this (or is there a more efficient way to do it as this feels clunky)?

Upvotes: 0

Answers (4)

GuedesBF

Reputation: 9858

You can use str_detect() and ifelse()

library(dplyr)
library(stringr)

df%>%
        mutate(Neighbour_State=ifelse(str_detect(Countyname, Neighbour_State), NA, Neighbour_State))

Or, preferably, str_detect and replace()

df%>%
        mutate(Neighbour_State=replace(Neighbour_State, str_detect(Countyname, Neighbour_State), NA))

#Or with pipes:

df%>%
        mutate(Neighbour_State=Neighbour_State%>%replace(., str_detect(Countyname, .), NA))

          Countyname Neighbour_County Neighbour_State
1 Autauga County, AL   Chilton County            <NA>
2 Autauga County, AL    Dallas County            <NA>
3 Baldwin County, AL  Escambia County              FL
4  Catron County, NM    Apache County              AZ

Upvotes: 1

MonJeanJean

Reputation: 2906

You don't need loop for this. You can use dplyr package instead:

data <- data %>%
  mutate(`Neighbour State` = ifelse(substr(Countyname, nchar(Countyname)-1, nchar(Countyname))== `Neighbour State`, NA,`Neighbour State`))

Upvotes: 0

user438383

Reputation: 6206

library(stringr)
dat %>% 
    dplyr::mutate(
        Neighbour.State = dplyr::if_else(
            str_trim(unlist(str_split(dat$Countyname[1], ","))[2]) == Neighbour.State, "NA", Neighbour.State)
        )

          Countyname Neighbour.County Neighbour.State
1 Autauga County, AL   Chilton County              NA
2 Autauga County, AL    Dallas County              NA
3 Baldwin County, AL  Escambia County              FL
4  Catron County, NM    Apache County              AZ

Upvotes: 0

AnilGoyal

Reputation: 26218

You may also use regex to find stateName from the county and use function na_if to replace values where it matches with the neighbour_county. Like this

regex '.*\\,\\s' will match everything upto a comma and a whitespace and replace it with '' i.e. nothing.
use mutate in combination with ifelse na_if replace case_when or any conditional function.

df <- read.table(header = T, text = "Countyname Neighbour_County    Neighbour_State
'Autauga County, AL'    'Chilton County'    AL
'Autauga County, AL'    'Dallas County' AL
'Baldwin County, AL'    'Escambia County'   FL
'Catron County, NM' 'Apache County' AZ")

library(dplyr, warn.conflicts = F)

df %>% 
  mutate(Neighbour_State = na_if(gsub('.*\\,\\s','', Countyname, perl = T), Neighbour_State))
#>           Countyname Neighbour_County Neighbour_State
#> 1 Autauga County, AL   Chilton County            <NA>
#> 2 Autauga County, AL    Dallas County            <NA>
#> 3 Baldwin County, AL  Escambia County              AL
#> 4  Catron County, NM    Apache County              NM

^{Created on 2021-07-03 by the reprex package (v2.0.0)}

Upvotes: 0

Replace strings in one column which match another column

Answers (4)

Related Questions