Reputation: 75
I am reading one text file which has redundant column names.
file.txt
A B B E E
2 2 4 4 5
3 4 5 6 8
I want to Keep columns which has B and E as column name. but when i read the file
rt<-read.table("file.txt",header=TRUE)
A B B.1 E E.1
1 2 4 4 5
2 4 5 6 8
Can i use regular expression while filtering dataframe?
Upvotes: 2
Views: 777
Reputation: 2359
Another way of coding for the same
rt[!grepl("^A",colnames(rt))]
B B.1 E E.1
2 4 4 5
4 5 6 8
Upvotes: 1
Reputation: 887981
We can use grep
to select the columns with names
that start with either B
or E
. By default, data.frame
do not allow to have to duplicate column names, and it is infact very useful in many ways.
df1[grep("^(B|E)", names(df1))]
# B B.1 E E.1
#1 2 4 4 5
#2 4 5 6 8
However, we could read the dataset with check.names=FALSE
in the read.table/read.csv
, if we need to keep the duplicate column names, but I would not recommend to do that as it will create a lot of confusion while subsetting. Without using check.names
, the read.table
calls the make.unique
to get unique
column names even if there are duplicate column names.
Upvotes: 1