Reputation: 51
num Name year X Y
1 1 A 2015 68 80%
2 1 A 2016 69 85%
3 1 A 2017 70 95%
4 1 A 2018 71 85%
5 1 A 2019 72 90%
6 2 B 2018 20 80%
7 2 B 2019 23 75%
8 2 C 2014 3 55%
9 4 D 2012 4 75%
10 4 D 2013 5 100%
Let's say I have data like the above. I want to remove the observations that do not have any observations in the most recent year. So, in the above, we would be left with A & B, but C & D would be deleted. The most recent season will always in the data and can be referenced with the max() function (i.e., we don't need to hardcode as 2019 and update it yearly).
The plan is to create a facet wrapped line chart where the percentages are on the y-axis and the years are on the x-axis. The facet would be on the names so each individual will have its own line chart with their percentages by year. We don't care about people who left, so that's why we're dropping records. Though, there is a chance they come back, so I don't want to drop them from the underlying data.
Upvotes: 2
Views: 253
Reputation: 269824
Using the data frame DF
shown in the Note at the end we use semi_join
to reduce it to the required names, convert Y to numeric and plot it. DF
is not modified.
A possible alternative to the semi_join
line is
filter(ave(year == max(year), Name, FUN = any)) %>%
The code is--
library(dplyr)
library(ggplot2)
DF %>%
semi_join(filter(., year == max(year)), by = "Name") %>%
mutate(Y = as.numeric(sub("%", "", Y))) %>%
ggplot(aes(year, Y)) + geom_line() + facet_wrap(~Name)
The input in reproducible form:
Lines <- " num Name year X Y
1 1 A 2015 68 80%
2 1 A 2016 69 85%
3 1 A 2017 70 95%
4 1 A 2018 71 85%
5 1 A 2019 72 90%
6 2 B 2018 20 80%
7 2 B 2019 23 75%
8 2 C 2014 3 55%
9 4 D 2012 4 75%
10 4 D 2013 5 100%"
DF <- read.table(text = Lines)
Upvotes: 2
Reputation: 887391
W can use subset
from base R
as well by subsetting the 'Name' where 'year' is the max
, get the unique
elements and create a logical vector with %in%
to subset the rows
subset(df1, Name %in% unique(Name[year == max(year)]))
# num Name year X Y
#1 1 A 2015 68 80%
#2 1 A 2016 69 85%
#3 1 A 2017 70 95%
#4 1 A 2018 71 85%
#5 1 A 2019 72 90%
#6 2 B 2018 20 80%
#7 2 B 2019 23 75%
No packages are used
Or the similar syntax in dplyr
library(dplyr)
df1 %>%
filter(Name %in% unique(Name[year == max(year)]))
df1 <- structure(list(num = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 4L, 4L
), Name = c("A", "A", "A", "A", "A", "B", "B", "C", "D", "D"),
year = c(2015L, 2016L, 2017L, 2018L, 2019L, 2018L, 2019L,
2014L, 2012L, 2013L), X = c(68L, 69L, 70L, 71L, 72L, 20L,
23L, 3L, 4L, 5L), Y = c("80%", "85%", "95%", "85%", "90%",
"80%", "75%", "55%", "75%", "100%")), class = "data.frame",
row.names = c("1",
"2", "3", "4", "5", "6", "7", "8", "9", "10"))
Upvotes: 2
Reputation: 39868
One dplyr
option could be:
df %>%
group_by(Name) %>%
filter(any(year %in% max(df$year)))
num Name year X Y
<int> <chr> <int> <int> <chr>
1 1 A 2015 68 80%
2 1 A 2016 69 85%
3 1 A 2017 70 95%
4 1 A 2018 71 85%
5 1 A 2019 72 90%
6 2 B 2018 20 80%
7 2 B 2019 23 75%
Upvotes: 3