Reputation: 11

R extracting a subset from a dataframe based on regular expression applied to the column name

I have a R dataframe that I want to filter (creating a subset) on the basis of the column name

The dataframe :

df<-data.frame( x = c(1:4), "A-1" = c(rnorm(4,11,4.4)), "A-2" = c(rnorm(4,11,4.4)), "B-2" = c(rnorm(4,11,4.4)))

x   A.1         A.2         B.2
1   8.704004    17.505799   12.025182
2   12.293454   9.452140    10.628045
3   12.100977   3.614021    8.216995
4   9.197816    13.717085   7.203580

Ideally the selection for the new dataframe should corresponds to a regular expression, for example with all the column matching A as the first character or alternatively "2" as the last one.

Thank you

Upvotes: 1

Answers (2)

Ronak Shah

Reputation: 389275

In base R, we can use startsWith and endsWith with a prefix and suffix respectively. They return logical values which can be ORed (|) to subset columns which either start with A OR end with "2".

df[,startsWith(names(df), 'A') | endsWith(names(df), '2')]

#    A.1    A.2   B.2
#1 19.05 11.347 11.03
#2 12.46  7.204 10.09
#3 23.72  8.497 16.13
#4 11.54  2.724 17.61

Upvotes: 1

Aron Strandberg

Reputation: 3090

In base R you can regex-select columns like this:

# A as first character
df[grep("^A", names(df))] 

# 2 as last character
df[grep("2$", names(df))]

The dplyr equivalent is:

library(dplyr)
df %>%
  select(matches("^A"))

df %>%
  select(matches("2$"))

Upvotes: 1

R extracting a subset from a dataframe based on regular expression applied to the column name

Answers (2)

Related Questions