Reputation: 3432
I would need help in order to add a new column into this table:
> table1
parent node blgth label
1 19 1 0.1357120409 SP_A
2 19 2 0.0794631914 SP_B
3 18 3 0.0861335837 SP_C
4 17 4 0.6693699494 SP_D
5 21 5 0.9480669221 SP_E
6 21 6 1.1270609760 SP_F
7 20 7 0.4409983357 SP_G
8 20 8 0.4412279821 SP_H
9 16 9 1.1449969710 SP_I
10 22 10 1.0935878043 SP_J
11 23 11 0.0000025842 SP_K
12 23 12 0.0053554342 SP_L
13 13 13 NA Root
14 13 14 0.4741914879 71.5
15 14 15 0.0606263444 87.6
16 15 16 0.1427482127 40.4
17 16 17 0.0836374533 100
18 17 18 0.4057476840 99.8
19 18 19 0.0979910062 87.7
20 15 20 0.4931972115 100
21 14 21 0.0957999921 73.1
22 13 22 0.7289752119 100
23 22 23 0.6091491479 100
The idea is to parse another table:
> table2
label VALUE Event
1 SP_E NA 1
2 SP_G 0.0026 4
3 SP_H 0.0026 4
4 SP_I NA 3
5 SP_F NA 2
6 SP_A 0.0970 5
7 SP_B 0.0970 5
8 SP_C 0.0970 5
9 SP_E NA 1
10 SP_G 0.0026 4
11 SP_H 0.0026 4
12 SP_F NA 2
13 SP_A 0.0970 5
14 SP_B 0.0970 5
15 SP_C 0.0970 5
16 SP_E NA 1
17 SP_G 0.0026 4
18 SP_H 0.0026 4
19 SP_F NA 2
20 SP_A 0.0970 5
21 SP_B 0.0970 5
22 SP_C 0.0970 5
23 SP_E NA 1
24 SP_G 0.0026 4
25 SP_H 0.0026 4
26 SP_F NA 2
27 SP_A 0.0970 5
28 SP_B 0.0970 5
29 SP_C 0.0970 5
The idea is to first within the table2
groupby Event
in order to see which label
are in the same Event
:
for instance:
SP_A
and SP_B
are in the Event 5
their VALUE
is 0.0970, and the idea is to check in the table1
the parent numbers
associated with this 3 SP labels:
here the parent values
are 19 and 18, then I simply add the VALUE
(0.0970) to the lowest node number (here 18) into a new column VALUE
in table1
:
parent node blgth label VALUE
19 1 0.1357120409 SP_A
19 2 0.0794631914 SP_B
18 3 0.0861335837 SP_C
17 4 0.6693699494 SP_D
21 5 0.9480669221 SP_E
21 6 1.1270609760 SP_F
20 7 0.4409983357 SP_G
20 8 0.4412279821 SP_H
16 9 1.1449969710 SP_I
22 10 1.0935878043 SP_J
23 11 0.0000025842 SP_K
23 12 0.0053554342 SP_L
13 13 NA Root
13 14 0.4741914879 71.5
14 15 0.0606263444 87.6
15 16 0.1427482127 40.4
16 17 0.0836374533 100
17 18 0.4057476840 99.8 0.0970
18 19 0.0979910062 87.7
15 20 0.4931972115 100
14 21 0.0957999921 73.1
13 22 0.7289752119 100
22 23 0.6091491479 100
then I do the same for Event 4:
SP_G
and SP_H
are in the Event 4
their VALUE
is 0.0026, the parent
value in table1
is 20, then I add the VALUE
0.0026 to the corresponding node number
20 into a new column VALUE
in table1
:
parent node blgth label VALUE
19 1 0.1357120409 SP_A
19 2 0.0794631914 SP_B
18 3 0.0861335837 SP_C
17 4 0.6693699494 SP_D
21 5 0.9480669221 SP_E
21 6 1.1270609760 SP_F
20 7 0.4409983357 SP_G
20 8 0.4412279821 SP_H
16 9 1.1449969710 SP_I
22 10 1.0935878043 SP_J
23 11 0.0000025842 SP_K
23 12 0.0053554342 SP_L
13 13 NA Root
13 14 0.4741914879 71.5
14 15 0.0606263444 87.6
15 16 0.1427482127 40.4
16 17 0.0836374533 100
17 18 0.4057476840 99.8 0.0970
18 19 0.0979910062 87.7
15 20 0.4931972115 100 0.0026
14 21 0.0957999921 73.1
13 22 0.7289752119 100
22 23 0.6091491479 100
If I do the process for the other one (they are lonely within event), I will simply add NA values
(because their VALUE
is NA in table2 anyway)
here is the expected final result:
parent node blgth label Mean_dNdS
19 1 0.1357120409 SP_A
19 2 0.0794631914 SP_B
18 3 0.0861335837 SP_C
17 4 0.6693699494 SP_D
21 5 0.9480669221 SP_E
21 6 1.1270609760 SP_F
20 7 0.4409983357 SP_G
20 8 0.4412279821 SP_H
16 9 1.1449969710 SP_I
22 10 1.0935878043 SP_J
23 11 0.0000025842 SP_K
23 12 0.0053554342 SP_L
13 13 NA Root
13 14 0.4741914879 71.5 NA
14 15 0.0606263444 87.6 NA
15 16 0.1427482127 40.4 NA
16 17 0.0836374533 100 NA
17 18 0.4057476840 99.8 0.0970
18 19 0.0979910062 87.7
15 20 0.4931972115 100 0.0026
14 21 0.0957999921 73.1 NA
13 22 0.7289752119 100 NA
22 23 0.6091491479 100 NA
Does someone would have and idea? I guess dplyr or a tydiverse method could be useful since it include groupby loop?
data:
table1
structure(list(parent = c(19L, 19L, 18L, 17L, 21L, 21L, 20L,
20L, 16L, 22L, 23L, 23L, 13L, 13L, 14L, 15L, 16L, 17L, 18L, 15L,
14L, 13L, 22L), node = 1:23, blgth = c(0.1357120409, 0.0794631914,
0.0861335837, 0.6693699494, 0.9480669221, 1.127060976, 0.4409983357,
0.4412279821, 1.144996971, 1.0935878043, 2.5842e-06, 0.0053554342,
NA, 0.4741914879, 0.0606263444, 0.1427482127, 0.0836374533, 0.405747684,
0.0979910062, 0.4931972115, 0.0957999921, 0.7289752119, 0.6091491479
), label = structure(c(9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L,
17L, 18L, 19L, 20L, 8L, 3L, 5L, 2L, 1L, 7L, 6L, 1L, 4L, 1L, 1L
), .Label = c("100", "40.4", "71.5", "73.1", "87.6", "87.7",
"99.8", "Root", "SP_A", "SP_B", "SP_C", "SP_D", "SP_E", "SP_F",
"SP_G", "SP_H", "SP_I", "SP_J", "SP_K", "SP_L"), class = "factor")), class = "data.frame", row.names = c(NA,
-23L))
table2
structure(list(label = structure(c(4L, 6L, 7L, 8L, 5L, 1L, 2L,
3L, 4L, 6L, 7L, 5L, 1L, 2L, 3L, 4L, 6L, 7L, 5L, 1L, 2L, 3L, 4L,
6L, 7L, 5L, 1L, 2L, 3L), .Label = c("SP_A", "SP_B", "SP_C", "SP_E",
"SP_F", "SP_G", "SP_H", "SP_I"), class = "factor"), VALUE = c(NA,
0.0026, 0.0026, NA, NA, 0.097, 0.097, 0.097, NA, 0.0026, 0.0026,
NA, 0.097, 0.097, 0.097, NA, 0.0026, 0.0026, NA, 0.097, 0.097,
0.097, NA, 0.0026, 0.0026, NA, 0.097, 0.097, 0.097), Event = c(1L,
4L, 4L, 3L, 2L, 5L, 5L, 5L, 1L, 4L, 4L, 2L, 5L, 5L, 5L, 1L, 4L,
4L, 2L, 5L, 5L, 5L, 1L, 4L, 4L, 2L, 5L, 5L, 5L)), class = "data.frame", row.names = c(NA,
-29L))
Upvotes: 0
Views: 65
Reputation: 4658
I think this is what you want.
We join table1 to table2 by label
to be able to use the parent
column. Then, we group by Event
and take the mean VALUE
(since it is always the same we could've taken min
or first
as well), and take the smallest value of parent
.
library(tidyverse)
table3 <- table2 %>%
left_join(table1, by = "label") %>%
group_by(Event) %>%
summarize(Mean_dNdS = mean(VALUE),
parent = min(parent)) %>%
select(parent, Mean_dNdS)
This gives
parent Mean_dNdS
<int> <dbl>
1 21 NA
2 21 NA
3 16 NA
4 20 0.0026
5 18 0.097
Then we can simply join this table to table1
and get the result you want.
table1 %>%
left_join(table3, by = c("node" = "parent"))
giving
parent node blgth label Mean_dNdS
1 19 1 0.1357120409 SP_A NA
2 19 2 0.0794631914 SP_B NA
3 18 3 0.0861335837 SP_C NA
4 17 4 0.6693699494 SP_D NA
5 21 5 0.9480669221 SP_E NA
6 21 6 1.1270609760 SP_F NA
7 20 7 0.4409983357 SP_G NA
8 20 8 0.4412279821 SP_H NA
9 16 9 1.1449969710 SP_I NA
10 22 10 1.0935878043 SP_J NA
11 23 11 0.0000025842 SP_K NA
12 23 12 0.0053554342 SP_L NA
13 13 13 NA Root NA
14 13 14 0.4741914879 71.5 NA
15 14 15 0.0606263444 87.6 NA
16 15 16 0.1427482127 40.4 NA
17 16 17 0.0836374533 100 NA
18 17 18 0.4057476840 99.8 0.0970
19 18 19 0.0979910062 87.7 NA
20 15 20 0.4931972115 100 0.0026
21 14 21 0.0957999921 73.1 NA
22 14 21 0.0957999921 73.1 NA
23 13 22 0.7289752119 100 NA
24 22 23 0.6091491479 100 NA
Upvotes: 1