How to filter dataframe with same column name?

Question

I am reading one text file which has redundant column names.

file.txt

A  B  B  E  E
2  2  4  4  5
3  4  5  6  8

I want to Keep columns which has B and E as column name. but when i read the file

rt<-read.table("file.txt",header=TRUE)

  A B B.1 E E.1
  1 2 4   4 5
  2 4 5   6 8

Can i use regular expression while filtering dataframe?

akrun · Accepted Answer

We can use grep to select the columns with names that start with either B or E. By default, data.frame do not allow to have to duplicate column names, and it is infact very useful in many ways.

 df1[grep("^(B|E)", names(df1))]
 #  B B.1 E E.1
 #1 2   4 4   5
 #2 4   5 6   8

However, we could read the dataset with check.names=FALSE in the read.table/read.csv, if we need to keep the duplicate column names, but I would not recommend to do that as it will create a lot of confusion while subsetting. Without using check.names, the read.table calls the make.unique to get unique column names even if there are duplicate column names.

How to filter dataframe with same column name?

Answers (2)

Related Questions