Reputation: 13
My data file (obs) looks approximately like this (only the first six lines for illustration)
date time station variable 1 variable 2
22/04/2013 05 10394 4 3
22/04/2013 04 10393 3 5
22/04/2013 07 10389 6 6
22/04/2013 04 20987 8 1
22/04/2013 02 29483 9 3
22/04/2013 03 49893 5 7
I have different lists of station numbers for several regions with varying number of stations. I want to set a condition that if the station number is contained in the station list, then the rows in the original data file (obs) with that station number should be saved to the variable test03 but not the rows which have station numbers that are not contained in the list.
example station list:
10394
10393
10389
29483
For only four stations I did it like that:
bed <- (obs$station == 10394 | obs$station == 10393 | obs$station == 10389 | obs$station == 29483)
test03 <- obs[bed,]
test03 then looks like this:
date time station variable 1 variable 2
22/04/2013 05 10394 4 3
22/04/2013 04 10393 3 5
22/04/2013 07 10389 6 6
22/04/2013 02 29483 9 3
So far, this is all very well. But how can I do the same if I don't want to type in each station separately (if I have more than a 100 stations or so)? I tried it with a for loop but then I only had the last station saved in test03 instead of all the stations.
Upvotes: 1
Views: 95
Reputation: 59970
Two quick ways I can think of:
If you have one row in your dataframe for each station then match
is a good possibility:
df <- data.frame( stations = letters[1:26] , var = runif(26) )
stations <- c("a","b","j")
df[ match( stations , df$stations ) , ]
stations var
1 a 0.311261693
2 b 0.002061808
10 j 0.343057454
If you have multiple entries for each station in your dataframe then subsetting using the %in%
operator should do what you are after:
df[ df$stations %in% stations , ]
stations var
1 a 0.311261693
2 b 0.002061808
10 j 0.343057454
Upvotes: 1
Reputation: 121568
Use %in%
to test for all the list. For example something like this :
transform(obs,
bed = station %in% c(10394,10393,10389,29483))
date time station variable1 variable2 bed
1 22/04/2013 5 10394 4 3 TRUE
2 22/04/2013 4 10393 3 5 TRUE
3 22/04/2013 7 10389 6 6 TRUE
4 22/04/2013 4 20987 8 1 FALSE
5 22/04/2013 2 29483 9 3 TRUE
6 22/04/2013 3 49893 5 7 FALSE
or simpler to get only the right rows:
obs[obs$station %in% c(10394,10393,10389,29483),]
date time station variable1 variable2
1 22/04/2013 5 10394 4 3
2 22/04/2013 4 10393 3 5
3 22/04/2013 7 10389 6 6
5 22/04/2013 2 29483 9 3
Upvotes: 0