user2306162
user2306162

Reputation: 13

R - loop - save the whole loop output

My data file (obs) looks approximately like this (only the first six lines for illustration)

date        time    station variable 1 variable 2
22/04/2013    05      10394          4          3
22/04/2013    04      10393          3          5
22/04/2013    07      10389          6          6  
22/04/2013    04      20987          8          1
22/04/2013    02      29483          9          3
22/04/2013    03      49893          5          7

I have different lists of station numbers for several regions with varying number of stations. I want to set a condition that if the station number is contained in the station list, then the rows in the original data file (obs) with that station number should be saved to the variable test03 but not the rows which have station numbers that are not contained in the list.

example station list:

10394
10393
10389
29483

For only four stations I did it like that:

bed <- (obs$station == 10394 | obs$station == 10393 | obs$station == 10389 | obs$station == 29483)

test03 <- obs[bed,]

test03 then looks like this:

date      time  station  variable 1  variable 2
22/04/2013  05    10394           4           3
22/04/2013  04    10393           3           5
22/04/2013  07    10389           6           6
22/04/2013  02    29483           9           3

So far, this is all very well. But how can I do the same if I don't want to type in each station separately (if I have more than a 100 stations or so)? I tried it with a for loop but then I only had the last station saved in test03 instead of all the stations.

Upvotes: 1

Views: 95

Answers (2)

Simon O&#39;Hanlon
Simon O&#39;Hanlon

Reputation: 59970

Two quick ways I can think of:

If you have one row in your dataframe for each station then match is a good possibility:

df <- data.frame( stations = letters[1:26] , var = runif(26) )
stations <- c("a","b","j")

df[ match( stations , df$stations ) , ]
   stations         var
1         a 0.311261693
2         b 0.002061808
10        j 0.343057454

If you have multiple entries for each station in your dataframe then subsetting using the %in% operator should do what you are after:

df[ df$stations %in% stations , ]
   stations         var
1         a 0.311261693
2         b 0.002061808
10        j 0.343057454

Upvotes: 1

agstudy
agstudy

Reputation: 121568

Use %in% to test for all the list. For example something like this :

transform(obs,
           bed =  station %in% c(10394,10393,10389,29483))

        date time station variable1 variable2   bed
1 22/04/2013    5   10394         4         3  TRUE
2 22/04/2013    4   10393         3         5  TRUE
3 22/04/2013    7   10389         6         6  TRUE
4 22/04/2013    4   20987         8         1 FALSE
5 22/04/2013    2   29483         9         3  TRUE
6 22/04/2013    3   49893         5         7 FALSE

or simpler to get only the right rows:

obs[obs$station %in% c(10394,10393,10389,29483),]

       date time station variable1 variable2
1 22/04/2013    5   10394         4         3
2 22/04/2013    4   10393         3         5
3 22/04/2013    7   10389         6         6
5 22/04/2013    2   29483         9         3

Upvotes: 0

Related Questions