Y. Kravets
Y. Kravets

Reputation: 11

Why dplyr::filter() includes data for which condition is False?

here is something I am struggling to understand. Imagine I have a dataframe that has 2 columns:

**Year**      **Date**
1925          1925-01-02
1941          1925-02-03
1990          1990-01-02
1956          NA
1990          1990-01-02
2002          2004-01-02

And I am trying to filter out all of those entries where value in **Year** column does not match year in **Date** column.

So, I have written a small parser, for the Date column assuming a much larger dataset:

dateParser <- function(date) {
  dateStr <- toString(date)
  yearStr <- strsplit(dateStr, "-")[[1]][1]
  yearInt <- as.integer(yearStr)

  return(yearInt)
}

And subsequently I am using dplyr::filter() to filter those occurrences out:

noMismatch <- dplyr::filter(data, as.integer(data$Year) == dateParser(data$Date))

Yet still I am seeing some rows in the resulting dataframe where years do not match. Why?

P.S. Let's assume that I don't care about NA values in the **Date** column and whenever NA occurs I am just leaving this row in.

Upvotes: 1

Views: 204

Answers (1)

Laurent
Laurent

Reputation: 2043

This probably has to do with the fact that your

dateParser(data$Date)

doesn't return what you're looking for in terms of data format.

Try:

library(lubridate)
library(dplyr)
noMismatch <- filter(data, as.integer(data$Year) == year(data$Date))

Upvotes: 1

Related Questions