Reputation: 11
here is something I am struggling to understand. Imagine I have a dataframe that has 2 columns:
**Year** **Date**
1925 1925-01-02
1941 1925-02-03
1990 1990-01-02
1956 NA
1990 1990-01-02
2002 2004-01-02
And I am trying to filter out all of those entries where value in **Year**
column does not match year in **Date**
column.
So, I have written a small parser, for the Date column assuming a much larger dataset:
dateParser <- function(date) {
dateStr <- toString(date)
yearStr <- strsplit(dateStr, "-")[[1]][1]
yearInt <- as.integer(yearStr)
return(yearInt)
}
And subsequently I am using dplyr::filter()
to filter those occurrences out:
noMismatch <- dplyr::filter(data, as.integer(data$Year) == dateParser(data$Date))
Yet still I am seeing some rows in the resulting dataframe where years do not match. Why?
P.S. Let's assume that I don't care about NA
values in the **Date**
column and whenever NA
occurs I am just leaving this row in.
Upvotes: 1
Views: 204
Reputation: 2043
This probably has to do with the fact that your
dateParser(data$Date)
doesn't return what you're looking for in terms of data format.
Try:
library(lubridate)
library(dplyr)
noMismatch <- filter(data, as.integer(data$Year) == year(data$Date))
Upvotes: 1