Keerthi Krutha
Keerthi Krutha

Reputation: 25

How to create a subset from a set of values within a column in R

I have a dataframe with 62 columns and 110 rows. In the column "date_observed" I have 57 dates with some of them having multiple records for the same date.

I am trying to extract only 12 dates out of this. They are not in any given order.

I tried this:

datesubset <- original %>% select (original$date_observed == c("13-Jun-21","21-Jun-21", "28-Jun-21", "13-Jul-21", "20-Jul-21", "8-Aug-21", "9-Aug-21", "25-Aug-21", "31-Aug-21", "8-Sep-21", "27-Sep-21"))

But, I got the following error:

Error: Must subset columns with a valid subscript vector. x Subscript has the wrong type logical. i It must be numeric or character.

I did try searching here and on google but I could find results only for how to subset a set of columns but not for specific values within columns. I am still new to R so please pardon me if this was a very simple question to ask.

Upvotes: 0

Views: 229

Answers (1)

bunchofbradys
bunchofbradys

Reputation: 129

In {dplyr}, the select() function is for selecting particular columns, but if you want to subset particular rows you want to use filter().

The logical operator == will also compare what is on the left, to EVERYTHING on the right, giving you a vector of TRUE/FALSE for each row, rather than just a single TRUE or FALSE for each row, which is what you are after.

What I think you are after is the logical operator %in% which checks to see if what is on the left appears at all on the right, and returns a single TRUE or FALSE.

As was mentioned, inside of tidyverse functions you don't need the $, you can just input the column name as in the example below.

I don't have your original data to double check, but the example below should work with your original data frame.

specific_dates <- c(
  "13-Jun-21",
  "21-Jun-21",
  "28-Jun-21",
  "13-Jul-21",
  "20-Jul-21",
  "8-Aug-21",
  "9-Aug-21",
  "25-Aug-21",
  "31-Aug-21",
  "8-Sep-21",
  "27-Sep-21"
)

datesubset <- original %>%
  filter(date_observed %in% specific_dates)

Upvotes: 2

Related Questions