Reputation: 188
I have a dataframe with ID's (100) and each ID has different number of rows and all ID's have same number of columns.
the sample dataframe looks like as follows
a <- data.frame(ID = c(1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,2,2),
A = c(12,12.5,15,16,18,20,25,26,29,35, 12,12.5,15,16,18,20,25,26,29,35),
B = c(20,19,18,17,16,20,25,28,30,35, 20,19,18,17,16,20,25,28,30,35),
C = c(2,1,5,9,10,11,13,18,25,27,2,1,5,9,10,11,13,18,25,27))
in each ID I want to subset the data between two specified values, these two values are in two other dataframes respectively.
the first value is the first row of each ID and the sample dataframe is as follows
b <- data.frame(ID = c(1,2),
A = c(12.0,12.0),
B = c(20,20),
C = c(2,2))
the second value is the specified row based on some value in a vector and the sample dataframe is as follows
c <- data.frame(ID = c(1,2),
A = c(25.0,20.0),
B = c(25,20),
C = c(13,11))
if we can observe, the rows corresponding to each ID, the values are same as in the main dataframe 'a'
the expected dataframe is as follows
d <- data.frame(ID = c(1,1,1,1,1,1,1,2,2,2,2,2,2),
A = c(12,12.5,15,16,18,20,25, 12,12.5,15,16,18,20),
B = c(20,19,18,17,16,20,25, 20,19,18,17,16,20),
C = c(2,1,5,9,10,11,13,2,1,5,9,10,11))
to get the expected output, I have tried the following code.....but failed
for (i in 1:nrow(b)){
Azimuth[i] = (a[which(a$A == b$A[i]):which(a$A == c$A[i])])
}
here, I am trying to use two dataframes 'b' and 'c' to subset the data from 'a'. but is it possible to get the same output without using dataframe 'b'?! because in dataframe 'b', each row is the first row in each ID from dataframe 'a'
Upvotes: 0
Views: 243
Reputation: 8844
A dplyr
solution
library(dplyr)
a %>%
mutate(end = FALSE) %>%
rows_update(c %>% mutate(end = TRUE), by = c("ID", "A", "B", "C")) %>%
group_by(ID) %>%
slice(1:which(end)) %>%
select(-end)
Output:
# A tibble: 13 x 4
# Groups: ID [2]
ID A B C
<dbl> <dbl> <dbl> <dbl>
1 1 12 20 2
2 1 12.5 19 1
3 1 15 18 5
4 1 16 17 9
5 1 18 16 10
6 1 20 20 11
7 1 25 25 13
8 2 12 20 2
9 2 12.5 19 1
10 2 15 18 5
11 2 16 17 9
12 2 18 16 10
13 2 20 20 11
Explanation:
I guess you want to use a dataframe to subset another one because you want to subset a
only if there exist certain combinations of ID
, A
, B
and C
, which you specify in your dataframe c
?
If that is the case, your goal can be achieved by taking the following steps:
a
. Call it end
and default to FALSE
.c
but set its default to TRUE
.end
in c
to update the end
in a
for each row marked by a combination of ID
, A
, B
and C
. In this way, the variable end
will become TRUE
only when there is a full match between a
and c
for the other four variables. If you cannot find a full match, then you will get this
Error: Attempting to update missing rows.
group_by(ID)
and select from the first row until where end
is TRUE
for each group defined by ID
.end
variable since it has no use any more.Upvotes: 2