nik
nik

Reputation: 2584

how to find which row values are not changing a lot within a threshold

I have a data which looks like below

df<-structure(list(C1 = c(0.003926348, 0.001642442, 6.72e-05, 0.000314789, 
0.00031372, 0.000196342, 0.01318432, 8.86e-05, 0.005671017, 0.003616196, 
0.026635645, 0.001136402, 0.000161111, 0.005777738, 0.000145104, 
0.000996546, 4.27e-05, 0.000114159, 0.001152384, 0.002860251, 
0.000284873), C2 = c(0.003901373, 0.001526195, 6.3e-05, 0.000387266, 
0.000312458, 0.000256647, 0.012489205, 0.00013071, 0.005196136, 
0.003059834, 0.024624562, 0.001025486, 0.000144964, 0.005659078, 
0.000105755, 0.000844871, 5.88e-05, 0.000118831, 0.000999354, 
0.002153167, 0.000214486), T1 = c(0.003646894, 0.001484503, 4.93e-05, 
0.00036715, 0.000333378, 0.000244199, 0.010286787, 6.48e-05, 
0.006180042, 0.00387491, 0.025428464, 0.001075376, 0.000122088, 
0.005448152, 0.000103301, 0.000974826, 4.32e-05, 0.000109876, 
0.001030364, 0.002777244, 0.000221169), T2 = c(0.00050388, 0.001135969, 
0.000113829, 2.14e-06, 0.00010293, 0.000315704, 0.01160593, 8.46e-05, 
0.004495437, 0.003062559, 0.018662663, 0.002096675, 0.000214814, 
0.002177849, 8.61e-05, 0.001057254, 3.27e-05, 0.000115822, 0.008133257, 
0.021657018, 0.000261339), G1 = c(0.001496712, 0.001640965, 0.000129124, 
3.02e-06, 0.000122839, 0.000305686, 0.01378774, 0.000199637, 
0.00534668, 0.00300097, 0.023290941, 0.002595433, 0.000262479, 
0.002926346, 0.000125655, 0.001302624, 4.89e-05, 0.000122862, 
0.009851791, 0.017621282, 0.000197561), G2 = c(0.00114337, 0.001285636, 
0.000122848, 2.46e-06, 9.1e-05, 0.000288897, 0.012288087, 0.000122286, 
0.002575368, 0.002158011, 0.022008677, 0.002017026, 0.000241754, 
0.003340175, 0.00013424, 0.001517655, 4.78e-05, 0.000110353, 
0.008293286, 0.018999466, 0.000191129)), .Names = c("C1", "C2", 
"T1", "T2", "G1", "G2"), row.names = c("A", "B", "C", "D", "E", 
"F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P", "PP", 
"TT", "EE", "FF", "AS"), class = "data.frame")

what I want to know is to find those rows that are higher or lower than average of the first two columns (row-wise) for example

   C1     C2    C3   C4
    A 1.2   1.3   1.6   1.9  
    B 1.2   1.0   0.1   0.2

Average 1.2 +1.3 =1.25
Average 1.6 +1.9 =1.75 

It means the A is true because the average of the first two row values are 0.5 different than the average of the second two values on the same row

The same for the B

Average 1.2 +1.0 =1.1
Average 0.1   0.2 = 0.15

It means that the B is also true

I just want to compare each two columns based on the first two columns (in above data we have 3 columns) so we will have for the first two columns (1 and 2) versus column (3 and 4) and first two column (1 and 2) versus the last two pairs (5 and 6)

Upvotes: 1

Views: 35

Answers (1)

LyzandeR
LyzandeR

Reputation: 37879

One way:

df$A <- with(df, (C1 + C2) / 2 - (T1 + T2) / 2 > 0.5)
df$B <- with(df, (C1 + C2) / 2 - (G1 + G2) / 2 > 0.5)

Out:

> df
            C1          C2          T1          T2          G1          G2     A     B
A  0.003926348 0.003901373 0.003646894 0.000503880 0.001496712 0.001143370 FALSE FALSE
B  0.001642442 0.001526195 0.001484503 0.001135969 0.001640965 0.001285636 FALSE FALSE
C  0.000067200 0.000063000 0.000049300 0.000113829 0.000129124 0.000122848 FALSE FALSE
D  0.000314789 0.000387266 0.000367150 0.000002140 0.000003020 0.000002460 FALSE FALSE
E  0.000313720 0.000312458 0.000333378 0.000102930 0.000122839 0.000091000 FALSE FALSE
F  0.000196342 0.000256647 0.000244199 0.000315704 0.000305686 0.000288897 FALSE FALSE
G  0.013184320 0.012489205 0.010286787 0.011605930 0.013787740 0.012288087 FALSE FALSE
H  0.000088600 0.000130710 0.000064800 0.000084600 0.000199637 0.000122286 FALSE FALSE
I  0.005671017 0.005196136 0.006180042 0.004495437 0.005346680 0.002575368 FALSE FALSE
J  0.003616196 0.003059834 0.003874910 0.003062559 0.003000970 0.002158011 FALSE FALSE
K  0.026635645 0.024624562 0.025428464 0.018662663 0.023290941 0.022008677 FALSE FALSE
L  0.001136402 0.001025486 0.001075376 0.002096675 0.002595433 0.002017026 FALSE FALSE
M  0.000161111 0.000144964 0.000122088 0.000214814 0.000262479 0.000241754 FALSE FALSE
N  0.005777738 0.005659078 0.005448152 0.002177849 0.002926346 0.003340175 FALSE FALSE
O  0.000145104 0.000105755 0.000103301 0.000086100 0.000125655 0.000134240 FALSE FALSE
P  0.000996546 0.000844871 0.000974826 0.001057254 0.001302624 0.001517655 FALSE FALSE
PP 0.000042700 0.000058800 0.000043200 0.000032700 0.000048900 0.000047800 FALSE FALSE
TT 0.000114159 0.000118831 0.000109876 0.000115822 0.000122862 0.000110353 FALSE FALSE
EE 0.001152384 0.000999354 0.001030364 0.008133257 0.009851791 0.008293286 FALSE FALSE
FF 0.002860251 0.002153167 0.002777244 0.021657018 0.017621282 0.018999466 FALSE FALSE
AS 0.000284873 0.000214486 0.000221169 0.000261339 0.000197561 0.000191129 FALSE FALSE

Upvotes: 1

Related Questions