ankit kumar
ankit kumar

Reputation: 75

How to filter dataframe with same column name?

I am reading one text file which has redundant column names.

file.txt

A  B  B  E  E
2  2  4  4  5
3  4  5  6  8

I want to Keep columns which has B and E as column name. but when i read the file

rt<-read.table("file.txt",header=TRUE)

  A B B.1 E E.1
  1 2 4   4 5
  2 4 5   6 8

Can i use regular expression while filtering dataframe?

Upvotes: 2

Views: 777

Answers (2)

Arun kumar mahesh
Arun kumar mahesh

Reputation: 2359

Another way of coding for the same   
 rt[!grepl("^A",colnames(rt))]
      B B.1 E E.1
      2   4 4   5
      4   5 6   8

Upvotes: 1

akrun
akrun

Reputation: 887981

We can use grep to select the columns with names that start with either B or E. By default, data.frame do not allow to have to duplicate column names, and it is infact very useful in many ways.

 df1[grep("^(B|E)", names(df1))]
 #  B B.1 E E.1
 #1 2   4 4   5
 #2 4   5 6   8

However, we could read the dataset with check.names=FALSE in the read.table/read.csv, if we need to keep the duplicate column names, but I would not recommend to do that as it will create a lot of confusion while subsetting. Without using check.names, the read.table calls the make.unique to get unique column names even if there are duplicate column names.

Upvotes: 1

Related Questions