user3570187
user3570187

Reputation: 1773

Filtering rows based on two columns and a running id

I need to restrict the data based on two columns, based on first city and location name. I want to get all the rows for which the FirstPlace is 1 and the first city is London. Any suggestions on how I can do that? In this case, the example should display all rows for John as he lived in London for the first year.

year <- c(2008, 2009, 2010, 2009, 2010, 2011)
person <- c('John', 'John', 'John', 'Brian', 'Brian','Vickey')
location <- c('London','Paris', 'Newyork','Paris','Paris','Miami')
df <- data.frame(year, person, location)

library(dplyr)
df %>% group_by(person) %>% mutate(FirstPlace = +(min(year) == year))

Upvotes: 1

Views: 125

Answers (1)

Jaap
Jaap

Reputation: 83215

Using data.table:

library(data.table)
setDT(df)[order(year), if(first(location) == 'London') .SD, by = person]

which gives:

   person year location
1:   John 2008   London
2:   John 2009    Paris
3:   John 2010  Newyork

Or with dplyr:

library(dplyr)
df %>% 
  arrange(year) %>% 
  group_by(person) %>% 
  filter(first(location) == 'London')

Upvotes: 3

Related Questions