Reputation: 157
I have a data frame that is 223k x 5 column. I would like to compare pairs of columns with each others.
a small example of data frame
NAME COLA COLB COLC COLD
1 T C G A
2 G C G A
3 A C G A
4 A G A G
5 A C A G
6 A G G A
7 A G NA NA
8 T C NA NA
9 C T A G
10 G A C T
11 A G T C
12 T C C T
13 C T C T
I would like to compare COLC and COLD with COLA and COLB and subset the data into groupS. like
GROUP 1
NAME COLA COLB COLC COLD
1 T C G A
10 G A C T
9 C T A G
11 A G T C
GROUP 2
NAME COLA COLB COLC COLD
2 G C G A
3 A C G A
5 A C A G
GROUP 3
NAME COLA COLB COLC COLD
4 A G A G
6 A G G A
12 T C C T
13 C T C T
GROUP 4
NAME COLA COLB COLC COLD
7 A G NA NA
8 T C NA NA
I try to use if statements to process, but it doesn't work for me. I also try to use subset function, but the factors in the columns are not the same. COLA and COLB are 6 factors, and COLC and COLD are 4 factors.
for (i in seq (Tab2$NAME){
if (Tab2$COLC == Tab2$COLA || Tab2$COLC == Tab2$COLB){
if (Tab2$COLD == Tab2$COLA || Tab2$COLD == Tab2$COLB){
Tab3 <- Tab2[i,]
Tab4 <- rbind(Tab4, Tab3)
}
}
if (Tab2$COLC != Tab2$COLA && Tab2$COLC != Tab2$COLB){
if (Tab2$COLD != Tab2$COLA && Tab2$COLD != Tab2$COLB){
Tab5 <- Tab2[i,]
Tab6<- rbind(Tab6, Tab5)
}
}
}
Upvotes: 0
Views: 41
Reputation: 407
Subset works like this example below:
# Create dataframe
df = read.table(text = '
NAME COLA COLB COLC COLD
1 T C G A
2 G C G A
3 A C G A
4 A G A G
5 A C A G
6 A G G A
7 A G NA NA
8 T C NA NA
9 C T A G
10 G A C T
11 A G T C
12 T C C T
13 C T C T
', header = T)
# Example grouping
group1 <- subset(df, df$COLC == df$COLA | df$COLC == df$COLB)
Upvotes: 1