Reputation: 423
I have two csv files:
File 1:
Year,Month,Day,Stn1,Stn2,Stn3
1979,01,01,10,0,5
1979,01,02,10,1,5,
1979,01,03,0,0,0
1979,01,04,5,10,30
1979,01,05,0,1,3
File 2:
Year,Month,Day
1979,01,02
1979,01,04
1979,01,05
File 1 contains daily data from 1979 to 2000, while File 2 contains random dates (non continuous).
What I want:
[1] Get the common dates between File 1 and File 2 if ANY of the station columns (Stn1 to 3) have values greater than or equal to 20. Then, save the output to a file.
In the above example, the output file should contain the following date:
Year,Month,Day,Stn1,Stn2,Stn3
1979,01,04,5,10,30
Since Stn 3, has a value of 30.
What I have so far:
I can get the common dates even by using a simple bash command. Unfortunately, I don't know how to filter the common dates satisfying the condition.I was wondering how to do this in R.
I'll appreciate any help on this matter.
-- Lyndz
Upvotes: 1
Views: 1931
Reputation: 10855
There are many ways to do this in R. Using Base R, we can merge()
the two files, using the [
form of the extract operator to include only rows from df1
where Stn3 >= 20
.
textFile1 <- "Year,Month,Day,Stn1,Stn2,Stn3
1979, 01, 01, 10, 0, 5
1979, 01, 02, 10, 1, 5
1979, 01, 03, 0, 0, 0
1979, 01, 04, 5,10,30
1979, 01, 05, 0,1,3"
textFile2 <- "Year,Month,Day
1979,01,02
1979,01,04
1979,01,05"
df1 <- read.csv(text = textFile1,header = TRUE)
df2 <- read.csv(text = textFile2,header = TRUE)
merge(x = df1[df1$Stn3 >=20,],y = df2, by = c("Year","Month","Day"))
...and the output:
> merge(x = df1[df1$Stn3 >=20,],y = df2, by = c("Year","Month","Day"))
Year Month Day Stn1 Stn2 Stn3
1 1979 1 4 5 10 30
>
By slightly editing the code, we can check for any value in Stn1
to Stn3
at or above 20.
merge(x = df1[rowSums((df1[4:6] >=20))>0,],y = df2, by = c("Year","Month","Day"))
The key change is rowSums((df1[4:6] >=20))>0
in the row dimension of the extract operator for df1
. If we evaluate this step by step, first df1[4:6] > 20
is evaluated, and returns a matrix of logical values.
> df1[4:6] >=20
Stn1 Stn2 Stn3
[1,] FALSE FALSE FALSE
[2,] FALSE FALSE FALSE
[3,] FALSE FALSE FALSE
[4,] FALSE FALSE TRUE
[5,] FALSE FALSE FALSE
Next, we use rowSums()
to result in a vector that has one element per row in df1
, and use this to subset df1
before merging with df2
.
> rowSums((df1[4:6] >=20))
[1] 0 0 0 1 0
>
Since only 1 observation has a value > 20 for Stn1
to Stn3
, we obtain the same results as illustrated above
When we alter the data to set the value of one of the variables > 20 and merge, the merge()
function returns 2 rows.
# now update data to add another column > 20
df1[2,5] <- 50
merge(x = df1[rowSums((df1[4:6] >=20))>0,],y = df2, by = c("Year","Month","Day"))
...and the output:
> merge(x = df1[rowSums((df1[4:6] >=20))>0,],y = df2, by = c("Year","Month","Day"))
Year Month Day Stn1 Stn2 Stn3
1 1979 1 2 10 50 5
2 1979 1 4 5 10 30
>
Upvotes: 2
Reputation: 2294
Try this code:
library(tidyverse)
dataset <- data.frame(Year = c("1979","1979","1979","1979","1979"),
Month = c("01","01","01","01","01"),
Day = c("01","02","03","04","05"),
Stn1 = c(10,10,0,5,0),
Stn2 = c(0,1,0,10,1),
Stn3 = c(5,5,0,30,3),
stringsAsFactors = FALSE)
dataset <- dataset %>% mutate(date = paste0(Year,Month,Day))
filterdata <- data.frame(Year = c("1979","1979","1979"),
Month = c("01","01","01"),
Day = c("02","04","05"),
stringsAsFactors = FALSE)
filterdata <- filterdata %>% mutate(date = paste0(Year,Month,Day))
dataset %>% semi_join(filterdata, by = 'date') %>% filter(Stn1 >= 20 | Stn2 >= 20 | Stn3 >=20) %>% select(-date)
You can filter your data by or
condition.
Regards.
Upvotes: 1