Reputation: 866
I have a data set with multiple columns. Using R
I want to keep only those column that have first character as T
to create a subset as shown in output data below.
Input Data
T1234 T5678 T9101112 A B D E
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
1 2 3 4 5 6 7
Output Data
T1234 T5678 T9101112
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
1 2 3
Any suggestion how this can be achieved? Thanks.
Upvotes: 0
Views: 63
Reputation: 7292
In base R using RegEx
df <- data.frame(T1234=rep(1,7),T5678=2,T9101112=3,A=4,B=5,D=6,E=7)
df[,grepl("^T",names(df))]
The regex pattern ^T
matches T at the beginning of each row name. You could refine the pattern to ^T\\d+
if you wanted to match just "T" followed by 1 or more numbers, as another example.
Also note that ^
asserts that you're at the beginning of the string. Without it you'd match "AT912340" because it contains a T.
For multiple characters (i.e. columns that start with T or M) we'd use the "or" operator |
df[,grepl("^T|M",names(df))]
And to match groups of characters like RDY or MTP we'd do it like this:
df[,grepl("^T|MTP|Check|RDY",names(df))]
Note: in the comments I mistakenly used brackets like so: [T,M]. Using brackets tells RegEx to match one of the characters in the brackets, so in this case it would match "T", "M", or ",". Obviously we don't want to match a comma here, and it's syntactically incorrect to have the commas in the brackets separating each character. To match "T" or "M" the correct syntax with brackets would be [TM], however, to match words, or short strings like above, we must use |
as the "or" operator.
Upvotes: 2
Reputation: 121568
Another solution without using regex
:
df[,substr(names(df),1,1) %in% c("T","M")]
Upvotes: 0
Reputation:
> require(dplyr)
> df <- data.frame(T1234=rep(1,7),T5678=2,T9101112=3,A=4,B=5,D=6,E=7)
> df
T1234 T5678 T9101112 A B D E
1 1 2 3 4 5 6 7
2 1 2 3 4 5 6 7
3 1 2 3 4 5 6 7
4 1 2 3 4 5 6 7
5 1 2 3 4 5 6 7
6 1 2 3 4 5 6 7
7 1 2 3 4 5 6 7
> select(df,starts_with('T'))
T1234 T5678 T9101112
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3
>
Or, without dplyr
> df[,grepl('T',colnames(df))]
T1234 T5678 T9101112
1 1 2 3
2 1 2 3
3 1 2 3
4 1 2 3
5 1 2 3
6 1 2 3
7 1 2 3
>
but the latter will hit the T in any position.
Upvotes: 1