Reputation: 35
I have a question that might be simple but I just can't figure it out. An example of my dataframe looks like this:
> df.corrected
Grp Grp.ind ini.1 fin.1 ini.2 fin.2
1 A A.1 0 5 0 5
2 A A.2 5 10 5 25
3 A A.3 10 15 NA NA
4 A A.4 15 26 NA NA
5 A A.5 26 28 NA NA
6 A A.6 28 30 25 30
7 B B.1 0 15 0 10
8 B B.2 NA NA 10 20
9 B B.3 15 20 20 25
10 B B.4 20 31 25 30
11 B B.5 31 50 30 50
I want to look in each group for the row number in which ini.1 for each observation lies between possible ini.2 and fin.2 values (i.e. ini.2<=ini.1< fin.2, for all pairs of ini.2,fin.2). I want to know the row number to create a new grp_ind id. My desire output should look like this:
> df.corrected
Grp Grp.ind ini.1 fin.1 ini.2 fin.2 rownum New.Grp.ind
1 A A.1 0 5 0 5 1 A.1
2 A A.2 5 10 5 25 2 A.2
3 A A.3 10 15 NA NA 2 A.2
4 A A.4 15 26 NA NA 2 A.2
5 A A.5 26 28 NA NA 6 A.6
6 A A.6 28 30 25 30 6 A.6
7 B B.1 0 15 0 10 1 B.1
8 B B.2 NA NA 10 20 NA B.2
9 B B.3 15 20 20 25 2 B.2
10 B B.4 20 31 25 30 3 B.3
11 B B.5 31 50 30 50 5 B.5
So far I tried:
df.corrected<-df %>%
group_by(Grp) %>%
mutate(rownum=ifelse(!(ini.1 >=ini.2 & ini.1 < fin.2),
NA, row_number())) %>%
mutate(rownum=ifelse(is.na(rownum),
row_number(which((ini.1 >=(ini.2%in%ini.2)) & (ini.1 < (fin.2%in%fin.2)))),rownum)) %>%
mutate(New.Grp.ind = Grp.ind[rownum])
and also taking out which() in the second mutate() and I did not have any success. Because my dataframe has over 6k observations I want a flexible solution without using na.locf. I tried that solution before and does not perform well in the whole dataset.
Does anyone have a guidance on how to tackle this problem?
I thank you all in advance for your help.
Upvotes: 1
Views: 293
Reputation: 20085
The approach using dplyr
is to self-join df.corrected
by Grp
and then apply filter
to satisfy rules mentioned by OP. This will provide New.Grp.ind
for matching conditions. Finally, for rows which could not meet matching rules we have join back (using right_join
) with df.corrected
by Grp
and Grp.ind
.
Note: I have assumed that Grp + Grp.ind
represents a row uniquely in data. If not, then a row number
should be added in data which can be using as part of right_join
.
library(dplyr)
df.corrected %>% inner_join((df.corrected %>% group_by(Grp) %>%
mutate(rownum= row_number())), by="Grp") %>%
filter(ini.1.x >=ini.2.y & ini.1.x < fin.2.y) %>%
select( Grp, Grp.ind = Grp.ind.x, ini.1 = ini.1.x, fin.1 = fin.1.x, ini.2 = ini.2.x,
fin.2 = fin.2.x, rownum, New.Grp.ind = Grp.ind.y) %>%
right_join(df.corrected, by=c("Grp","Grp.ind")) %>%
select( Grp, Grp.ind, ini.1 = ini.1.x, fin.1 = fin.1.x, ini.2 = ini.2.x, fin.2 = fin.2.x,
rownum, New.Grp.ind) %>%
mutate(New.Grp.ind = coalesce(New.Grp.ind, Grp.ind))
# Grp Grp.ind ini.1 fin.1 ini.2 fin.2 rownum New.Grp.ind
# 1 A A.1 0 5 0 5 1 A.1
# 2 A A.2 5 10 5 25 2 A.2
# 3 A A.3 10 15 NA NA 2 A.2
# 4 A A.4 15 26 NA NA 2 A.2
# 5 A A.5 26 28 NA NA 6 A.6
# 6 A A.6 28 30 25 30 6 A.6
# 7 B B.1 0 15 0 10 1 B.1
# 8 B B.2 NA NA NA NA NA B.2
# 9 B B.3 15 20 20 25 2 B.2
# 10 B B.4 20 31 25 30 3 B.3
# 11 B B.5 31 50 30 50 5 B.5
Data:
df.corrected <- read.table(text =
"Grp Grp.ind ini.1 fin.1 ini.2 fin.2
1 A A.1 0 5 0 5
2 A A.2 5 10 5 25
3 A A.3 10 15 NA NA
4 A A.4 15 26 NA NA
5 A A.5 26 28 NA NA
6 A A.6 28 30 25 30
7 B B.1 0 15 0 10
8 B B.2 NA NA 10 20
9 B B.3 15 20 20 25
10 B B.4 20 31 25 30
11 B B.5 31 50 30 50",
header = TRUE, stringsAsFactors = FALSE)
Upvotes: 1