Victor.H
Victor.H

Reputation: 157

How to subset the table data depending on the multi columns in R?

I have a data frame that is 223k x 5 column. I would like to compare pairs of columns with each others.

a small example of data frame

NAME    COLA    COLB    COLC    COLD  
1         T       C       G       A   
2         G       C       G       A   
3         A       C       G       A   
4         A       G       A       G   
5         A       C       A       G   
6         A       G       G       A   
7         A       G       NA      NA  
8         T       C       NA      NA   
9         C       T       A       G   
10        G       A       C       T     
11        A       G       T       C   
12        T       C       C       T   
13        C       T       C       T    

I would like to compare COLC and COLD with COLA and COLB and subset the data into groupS. like

 GROUP 1  
 NAME    COLA    COLB    COLC    COLD  
 1         T       C       G       A   
 10        G       A       C       T   
 9         C       T       A       G   
 11        A       G       T       C  

 GROUP 2  
 NAME    COLA    COLB    COLC    COLD   
 2         G       C       G       A   
 3         A       C       G       A   
 5         A       C       A       G   

 GROUP 3  
 NAME    COLA    COLB    COLC    COLD   
 4         A       G       A       G   
 6         A       G       G       A   
 12        T       C       C       T    
 13        C       T       C       T    

 GROUP 4    
 NAME    COLA    COLB    COLC    COLD    
 7        A       G       NA      NA  
 8        T       C       NA      NA  

I try to use if statements to process, but it doesn't work for me. I also try to use subset function, but the factors in the columns are not the same. COLA and COLB are 6 factors, and COLC and COLD are 4 factors.

for (i in seq (Tab2$NAME){
if (Tab2$COLC == Tab2$COLA || Tab2$COLC == Tab2$COLB){
if (Tab2$COLD == Tab2$COLA || Tab2$COLD == Tab2$COLB){
  Tab3 <- Tab2[i,]
  Tab4 <- rbind(Tab4, Tab3) 
  }
 }   
 if (Tab2$COLC != Tab2$COLA && Tab2$COLC != Tab2$COLB){
if (Tab2$COLD != Tab2$COLA && Tab2$COLD != Tab2$COLB){
  Tab5 <- Tab2[i,]
  Tab6<- rbind(Tab6, Tab5) 
  }
 }     
 }

Upvotes: 0

Views: 41

Answers (1)

Monk
Monk

Reputation: 407

Subset works like this example below:

# Create dataframe
df = read.table(text = '
NAME    COLA    COLB    COLC    COLD  
1         T       C       G       A   
2         G       C       G       A   
3         A       C       G       A   
4         A       G       A       G   
5         A       C       A       G   
6         A       G       G       A   
7         A       G       NA      NA  
8         T       C       NA      NA   
9         C       T       A       G   
10        G       A       C       T     
11        A       G       T       C   
12        T       C       C       T   
13        C       T       C       T    
', header = T)

# Example grouping
group1 <- subset(df, df$COLC == df$COLA | df$COLC == df$COLB)

Upvotes: 1

Related Questions