Reputation: 37
I have a data frame that contains the highest and lowest temperature in a given year by Climate Station - All.Stations
dataset:
Station.Name Year Month Day TMAX TMIN
GRAND MARAIS 1942 7 28 82 60
GRAND MARAIS 1962 3 17 42 22
LEECH LAKE 1956 7 3 72 50
ALBERT LEA 3 SE 1998 1 25 25 15
TWO HARBORS 1933 5 20 77 42
ARGYLE 1922 9 13 NA NA
I also have a data frame of complete years by Climate Station (i.e., these are the years where I have data for every day in the year) - complete.years
dataset:
Station.Name Year
DULUTH 1904
AGASSIZ REFUGE 1995
LEECH LAKE 1956
GRAND MARAIS 1942
LEECH LAKE 1994
I want to filter the first data frame to only the data where Station Name and Year exist and match in the second data frame.
The correct results would be:
Station.Name Year TMAX
GRAND MARAIS 1942 82
LEECH LAKE 1956 72
Here's what I've got so far, using dplyr:
Max.Tempurature <- All_Stations %>%
group_by(Station.Name, Year) %>%
select(Station.Name, Year, TMAX) %>%
filter(min_rank(desc(TMAX)) <= 1) %>%
filter((Year %in% complete.years$Year & Station.Name %in% complete.years$Station.Name))
I can filter by both Year and Station.Name, but that searches the whole data frame for matches.
How do I filter by Station.Name and Year existing in the same observation?
Upvotes: 3
Views: 84
Reputation: 26333
Or with merge
cols <- c('Station.Name', 'Year', 'TMAX')
merge(All.Stations[cols], complete.years, all.x = FALSE)
# Station.Name Year TMAX
#1 GRAND MARAIS 1942 82
#2 LEECH LAKE 1956 72
data
All.Stations <- structure(list(Station.Name = c("GRAND MARAIS", "GRAND MARAIS",
"LEECH LAKE", "ALBERT LEA 3 SE", "TWO HARBORS", "ARGYLE"), Year = c(1942L,
1962L, 1956L, 1998L, 1933L, 1922L), Month = c(7L, 3L, 7L, 1L,
5L, 9L), Day = c(28L, 17L, 3L, 25L, 20L, 13L), TMAX = c(82L,
42L, 72L, 25L, 77L, NA), TMIN = c(60L, 22L, 50L, 15L, 42L, NA
)), .Names = c("Station.Name", "Year", "Month", "Day", "TMAX",
"TMIN"), class = "data.frame", row.names = c(NA, -6L))
complete.years <- structure(list(Station.Name = c("DULUTH", "AGASSIZ REFUGE", "LEECH LAKE",
"GRAND MARAIS", "LEECH LAKE"), Year = c(1904L, 1995L, 1956L,
1942L, 1994L)), .Names = c("Station.Name", "Year"), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 1
Reputation: 886938
We can do an inner_join
library(dplyr)
inner_join(All.Stations[c(1, 2, 5)], complete.years)
# Station.Name Year TMAX
#1 GRAND MARAIS 1942 82
#2 LEECH LAKE 1956 72
All.Stations <- structure(list(Station.Name = c("GRAND MARAIS", "GRAND MARAIS",
"LEECH LAKE", "ALBERT LEA 3 SE", "TWO HARBORS", "ARGYLE"), Year = c(1942L,
1962L, 1956L, 1998L, 1933L, 1922L), Month = c(7L, 3L, 7L, 1L,
5L, 9L), Day = c(28L, 17L, 3L, 25L, 20L, 13L), TMAX = c(82L,
42L, 72L, 25L, 77L, NA), TMIN = c(60L, 22L, 50L, 15L, 42L, NA
)), class = "data.frame", row.names = c(NA, -6L))
complete.years <- structure(list(Station.Name = c("DULUTH",
"AGASSIZ REFUGE", "LEECH LAKE",
"GRAND MARAIS", "LEECH LAKE"), Year = c(1904L, 1995L, 1956L,
1942L, 1994L)), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 2