Mehdi.K
Mehdi.K

Reputation: 371

Subset dataframe based on factor levels starting with specific character or string of characters

I am trying to subset a dataframe based on a factor (here, ID). What I would like is to subset based on factor levels starting with a specific character. Here is an example dataframe:

ID = c("100", "100a", "101", "103", "204", "206", "207", "207a", "207b") # ID is a factor
Value = rnorm(9)
df = data.frame(ID, Value) 

I would like to end up with two separate dataframes, one with ID starting with "1" and one with ID starting with "2". My dataframe is much longer than the one provided, I can thus not subset based on a list of factor levels. Here it was done with a continuous variable, but I haven't found an example with factors.

Thank you for your help!

Upvotes: 3

Views: 4136

Answers (1)

Gavin
Gavin

Reputation: 1123

In my mind startsWith offers a nice clean interface, however there is one caveat.

You might naively think you can use it as follows:

ones <- df[startsWith(df$ID, '1'),]

However you end up with the following error:

Error in startsWith(df$ID, "1") : non-character object(s)

This is due to the fact that you are working with factors, leading you the following syntax:

ones <- df[startsWith(as.character(df$ID), '1'),]

Another option is when you create the data frame, pass the option stringsAsFactors = FALSE, then you can run the first form, and in your subset data frames convert to factors if needed.

Upvotes: 2

Related Questions