Reputation: 65
I have a dataset that includes many "transects", and a multiple "transects" comprise a "plane" (e.g. Plane P1 = Transect T1 + Transect T2)
The current data structure (see example below) has the length of each transect repeated in the column "transect length" for each sample that was collected in the transect. I need to sum the transect lengths to get the "plane length" for EACH plane.
Here is what the data structure I've been given looks like:
Plane name | Transect name | Transect length | Sample |
---|---|---|---|
P1 | T1 | 1002 | x |
P1 | T1 | 1002 | y |
P1 | T2 | 1034 | z |
P1 | T2 | 1034 | a |
P1 | T2 | 1034 | b |
P2 | T3 | 1222 | c |
P2 | T3 | 1222 | d |
P2 | T4 | 1324 | e |
I added an "index" column, thinking there must be some way I can add the transect lengths when the index for that row = 1.
data3 <- data2 %>%
group_by(transect)%>%
mutate(index = seq(n())) %>%
ungroup() %>%
Table showing the added index column:
Plane name | Transect name | Transect length | Sample | Index |
---|---|---|---|---|
P1 | T1 | 1002 | x | 1 |
P1 | T1 | 1002 | y | 2 |
P1 | T2 | 1034 | z | 1 |
P1 | T2 | 1034 | a | 2 |
P1 | T2 | 1034 | b | 3 |
P2 | T3 | 1222 | c | 1 |
P2 | T3 | 1222 | d | 2 |
P2 | T4 | 1324 | e | 1 |
But now I am stuck on how to add only the transect lengths with index = 1 for a certain plane, and iterate over all the planes...
So for example, in the end, what I need it to look like is this:
Plane name | Transect name | Transect length | Sample | Index | Plane length |
---|---|---|---|---|---|
P1 | T1 | 1002 | x | 1 | 2036 |
P1 | T1 | 1002 | y | 2 | 2036 |
P1 | T2 | 1034 | z | 1 | 2036 |
P1 | T2 | 1034 | a | 2 | 2036 |
P1 | T2 | 1034 | b | 3 | 2036 |
P2 | T3 | 1222 | c | 1 | 2546 |
P2 | T3 | 1222 | d | 2 | 2546 |
P2 | T4 | 1324 | e | 1 | 2546 |
Notice the plane length for plane P1 is the sum of only the T1 transect lengths that have index = 1. So: 1002 + 1034 = 2036, NOT 1002 + 1002 + 1034 + 1034 + 1034 = 5106.
I tried several variations of code using mutate, sum, summarize, etc. with no luck:
plane_lengths <- data3 %>%
mutate(plane_length = sum(transect_length[index ==1]))
^ this one seems to do nothing. :(
plane_lengths <- data3 %>% group_by(plane_name) %>% summarize(plane_length = sum(transect_length))
^ this one just adds every value of transect length (all index values, not just index = 1)
Is there some way to modify the code above to take only values for transect length where the index = 1?
I am still a bit new to R, and I really appreciate any help I can get!!! Thank you.
Upvotes: 1
Views: 2974
Reputation: 887148
After creating the 'index', change the grouping to 'Planename' before we create the 'PlaneLength' column
library(dplyr)
data2 %>%
group_by(Transectname) %>%
mutate(index = row_number()) %>%
group_by(Planename) %>%
mutate(PlaneLength = sum(Transectlength[index == 1])) %>%
ungroup
-output
# A tibble: 8 × 6
Planename Transectname Transectlength Sample index PlaneLength
<chr> <chr> <int> <chr> <int> <int>
1 P1 T1 1002 x 1 2036
2 P1 T1 1002 y 2 2036
3 P1 T2 1034 z 1 2036
4 P1 T2 1034 a 2 2036
5 P1 T2 1034 b 3 2036
6 P2 T3 1222 c 1 2546
7 P2 T3 1222 d 2 2546
8 P2 T4 1324 e 1 2546
We may also use with_groups
so that it won't create the group attrribute
data2 %>%
with_groups(Transectname, mutate, index = row_number()) %>%
with_groups(Planename, mutate,
PlaneLength = sum(Transectlength[index == 1]))
-output
Planename Transectname Transectlength Sample index PlaneLength
1 P1 T1 1002 x 1 2036
2 P1 T1 1002 y 2 2036
3 P1 T2 1034 z 1 2036
4 P1 T2 1034 a 2 2036
5 P1 T2 1034 b 3 2036
6 P2 T3 1222 c 1 2546
7 P2 T3 1222 d 2 2546
8 P2 T4 1324 e 1 2546
data2 <- structure(list(Planename = c("P1", "P1", "P1", "P1", "P1", "P2",
"P2", "P2"), Transectname = c("T1", "T1", "T2", "T2", "T2", "T3",
"T3", "T4"), Transectlength = c(1002L, 1002L, 1034L, 1034L, 1034L,
1222L, 1222L, 1324L), Sample = c("x", "y", "z", "a", "b", "c",
"d", "e")), class = "data.frame", row.names = c(NA, -8L))
Upvotes: 0