Reputation: 85
For example, here is the dataframe. I want to delete the rows if the age difference in each ProjectNumber is less than 5. Within ProjectNumber = 1, the age difference are 6(56-50) so rows in ProjectNumber 1 is maintained. In ProjectNumber2, the age difference are 2(37-35), so 3 rows in ProjectNumber 2 will be deleted. How can I do that? There are thousands of Project Number in real data set so can't really do it mannually. The key problem is how to specify these rows and delete them. Thank you.
ProjectNumber | Age |
---|---|
1 | 50 |
1 | 52 |
1 | 53 |
1 | 55 |
1 | 56 |
2 | 35 |
2 | 36 |
2 | 37 |
3 | 40 |
3 | 41 |
3 | 42 |
3 | 43 |
3 | 45 |
3 | 46 |
Upvotes: 4
Views: 65
Reputation: 102349
A data.table
option
> setDT(df)[, .SD[diff(range(Age)) >= 5], ProjectNumber]
ProjectNumber Age
1: 1 50
2: 1 52
3: 1 53
4: 1 55
5: 1 56
6: 3 40
7: 3 41
8: 3 42
9: 3 43
10: 3 45
11: 3 46
Upvotes: 1
Reputation: 79174
Akrun's answer is top. Here: cumbersome but leads to the goal:
df %>%
group_by(ProjectNumber) %>%
mutate(diff_Age = max(Age) - min(Age)) %>%
subset(!diff_Age < 5) %>%
select(-diff_Age)
Output:
ProjectNumber Age
<int> <int>
1 1 50
2 1 52
3 1 53
4 1 55
5 1 56
6 3 40
7 3 41
8 3 42
9 3 43
10 3 45
11 3 46
Upvotes: 3
Reputation: 160647
Base R variants:
aggregate
then merge
. This version is good if you have more than one "key" (ProjectNumber
here).
keep <- subset(aggregate(Age ~ ProjectNumber, data = df1,
FUN = function(z) diff(range((z))) >= 5), Age)
keep
# ProjectNumber Age
# 1 1 TRUE
# 3 3 TRUE
merge(df1, keep[,1,drop=FALSE], by = "ProjectNumber")
# ProjectNumber Age
# 1 1 50
# 2 1 52
# 3 1 53
# 4 1 55
# 5 1 56
# 9 3 40
# 10 3 41
# 11 3 42
# 12 3 43
# 13 3 45
# 14 3 46
aggregate
then %in%
:
### 'keep' from above
subset(df1, ProjectNumber %in% keep$ProjectNumber)
ave
then simple subset:
keep <- ave(df1$Age, df1$ProjectNumber,
FUN = function(z) diff(range(z)) >= 5) > 0
keep
# [1] TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE
df1[keep,]
# ProjectNumber Age
# 1 1 50
# 2 1 52
# 3 1 53
# 4 1 55
# 5 1 56
# 9 3 40
# 10 3 41
# 11 3 42
# 12 3 43
# 13 3 45
# 14 3 46
Upvotes: 5
Reputation: 887571
We do a group by filter
i.e. grouped by 'ProjectNumber', check the diff
erence of range
of 'Age' is greater than or equal to 5
library(dplyr)
df1 %>%
group_by(ProjectNumber) %>%
filter(diff(range(Age)) >=5) %>%
ungroup
df1 <- structure(list(ProjectNumber = c(1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L), Age = c(50L, 52L, 53L, 55L, 56L,
35L, 36L, 37L, 40L, 41L, 42L, 43L, 45L, 46L)), class = "data.frame",
row.names = c(NA,
-14L))
Upvotes: 2