Reputation: 83
I wanted to know how I could delete the rows where the values in two different columns is not the same. Here is an example to illustrate my problem:
test.1 <- c("A", "B", "C", "D")
test.2 <- c("2009-02", "2009-04", "2010-01", "2011-02")
test.3 <- c("2009-02", "2009-08", "2010-01", "2013-06")
test.data <- data.frame(test.1, test.2, test.3)
which gives:
test.1 test.2 test.3
1 A 2009-02 2009-02
2 B 2009-04 2009-08
3 C 2010-01 2010-01
4 D 2011-02 2013-06
I would like to delete the rows where test.2 & test.3 are not equal, i.e. the second and fourth rows. I tried with the function duplicated as I found that
test.data.2 = test.data[!duplicated(test.data[,c('test.2', 'test.3')]),]
would remove the rows where test.2 = test.3. Therefore, I remove "!" as followed:
test.data.2 = test.data[duplicated(test.data[,c('test.2', 'test.3')]),]
but it is not working. Would you have any other suggestions? Thank you very much for your help
Upvotes: 1
Views: 495
Reputation: 5660
You can create a new dataframe where only the values where test.2
and test.3
are the same:
test.data.2 <- test.data[test.data$test.2 == test.data$test.3,]
Likewise, you can filter out values where they are not the same:
test.data.2 <- test.data[-which(test.data$test.2 != test.data$test.3),]
Upvotes: 1
Reputation: 30474
With tidyverse
:
library(tidyverse)
test.data %>%
filter(test.2 == test.3)
Or base R:
test.data[test.data$test.2 == test.data$test.3,]
Upvotes: 1
Reputation: 887118
We can use subset
to subset the rows from base R
subset(test.data, test.2 == test.3)
Or using dplyr
library(dplyr)
test.data %>%
slice(which(test.2 == test.3))
Upvotes: 1