Reputation: 27
I have a data frame structured as follows:
+----------+------+
| ID | year |
+----------+------+
| 1 | 2002 |
| 1 | 2003 |
| 1 | 2004 |
| 2 | 2015 |
| 2 | 2016 |
| 2 | 2017 |
| 2 | 2018 |
| 3 | 2004 |
| 3 | 2005 |
+----------+------+
I would like to add a variable which flags the first (or earliest) occurrence within ID to get the following:
+----------+------+------+
| ID | year | flag |
+----------+------+------+
| 1 | 2002 | 1 |
| 1 | 2003 | 0 |
| 1 | 2004 | 0 |
| 2 | 2015 | 1 |
| 2 | 2016 | 0 |
| 2 | 2017 | 0 |
| 2 | 2018 | 0 |
| 3 | 2004 | 1 |
| 3 | 2005 | 0 |
+----------+------+------+
Is there an easy way to do this in dplyr?
Upvotes: 0
Views: 192
Reputation: 101628
Another base R option using ave
transform(
df,
flag = ave(1:nrow(df),ID, FUN = function(x) seq_along(x)==1)
)
Upvotes: 0
Reputation: 887153
With dplyr
, we can group by 'ID' and create a logical vector based on the min
value of 'year', coerce it to binary with +
df1 %>%
group_by(ID) %>%
mutate(flag = +(year == min(year))
If the data is already order
ed
df1 %>%
mutate(flag = !duplicated(ID))
Or if the 'year' is already order
ed
df1$flag <- !duplicated(df1$ID)
Upvotes: 4