hiperhiper
hiperhiper

Reputation: 331

Match rows and and remove values from a cell if condition is met

I have a data.frame such as

data = data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
                  family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
                  species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"))

What I'm trying to do is, if character names in columns family and species match by row, keep the name on family and add NA to the respective species column cell. I was trying to loop but it doesn't seem like a worthy way to do this since my data is pretty big...

Upvotes: 2

Views: 36

Answers (2)

VvdL
VvdL

Reputation: 3210

Using base R, you can assign NA to the species column after filtering for your use case:

data <- data.frame(plot = c(1, 1, 1, 2, 2, 3, 3, 3, 3),
                   family = c("Fab", "Fab", "Fab", "Pip", "Fab", "Mel", "Myr", "Myr", "Fab"),
                   species = c("Fab", "Fab", "sp 1", "sp2", "Fab", "sp3", "sp4", "sp5", "sp1"), 
                   stringsAsFactors = FALSE)

data[data$family == data$species, ]$species <- NA
data
#>   plot family species
#> 1    1    Fab    <NA>
#> 2    1    Fab    <NA>
#> 3    1    Fab    sp 1
#> 4    2    Pip     sp2
#> 5    2    Fab    <NA>
#> 6    3    Mel     sp3
#> 7    3    Myr     sp4
#> 8    3    Myr     sp5
#> 9    3    Fab     sp1

Upvotes: 3

HoelR
HoelR

Reputation: 6563

library(tidyverse)

df %>%  
  mutate(species = case_when(species == family ~ NA_character_, 
                             TRUE ~ species))

# A tibble: 9 × 3
   plot family species
  <dbl> <chr>  <chr>  
1     1 Fab    NA     
2     1 Fab    NA     
3     1 Fab    sp 1   
4     2 Pip    sp2    
5     2 Fab    NA     
6     3 Mel    sp3    
7     3 Myr    sp4    
8     3 Myr    sp5    
9     3 Fab    sp1    

Upvotes: 2

Related Questions