Reputation: 11
I have a R dataframe that I want to filter (creating a subset) on the basis of the column name
The dataframe :
df<-data.frame( x = c(1:4), "A-1" = c(rnorm(4,11,4.4)), "A-2" = c(rnorm(4,11,4.4)), "B-2" = c(rnorm(4,11,4.4)))
x A.1 A.2 B.2
1 8.704004 17.505799 12.025182
2 12.293454 9.452140 10.628045
3 12.100977 3.614021 8.216995
4 9.197816 13.717085 7.203580
Ideally the selection for the new dataframe should corresponds to a regular expression, for example with all the column matching A as the first character or alternatively "2" as the last one.
Thank you
Upvotes: 1
Views: 1301
Reputation: 389275
In base R, we can use startsWith
and endsWith
with a prefix and suffix respectively. They return logical values which can be OR
ed (|
) to subset columns which either start with A OR end with "2".
df[,startsWith(names(df), 'A') | endsWith(names(df), '2')]
# A.1 A.2 B.2
#1 19.05 11.347 11.03
#2 12.46 7.204 10.09
#3 23.72 8.497 16.13
#4 11.54 2.724 17.61
Upvotes: 1
Reputation: 3090
In base R you can regex-select columns like this:
# A as first character
df[grep("^A", names(df))]
# 2 as last character
df[grep("2$", names(df))]
The dplyr
equivalent is:
library(dplyr)
df %>%
select(matches("^A"))
df %>%
select(matches("2$"))
Upvotes: 1