Reputation: 87
I'm trying to create a new variable in R containing the initial values of another variable (crime) based on groups (countries) considering the initial period of time observable per group (on panel data framework), my current data looks like this:
country | year | Crime |
---|---|---|
Albania | 2016 | 2.7369478 |
Albania | 2017 | 2.0109779 |
Argentina | 2002 | 9.474084 |
Argentina | 2003 | 7.7898825 |
Argentina | 2004 | 6.0739941 |
And I want it to look like this:
country | year | Crime | Initial_Crime |
---|---|---|---|
Albania | 2016 | 2.7369478 | 2.7369478 |
Albania | 2017 | 2.0109779 | 2.7369478 |
Argentina | 2002 | 9.474084 | 9.474084 |
Argentina | 2003 | 7.7898825 | 9.474084 |
Argentina | 2004 | 6.0739941 | 9.474084 |
I saw that ddply could make it work this way, but the problem is that it is not longer supported by the latest R updates.
Thank you in advance.
Upvotes: 1
Views: 118
Reputation: 24832
library(data.table)
setDT(data)[, Initial_Crime:=.SD[1,Crime], by=country]
country year Crime Initial_Crime
1: Albania 2016 2.736948 2.736948
2: Albania 2017 2.010978 2.736948
3: Argentina 2002 9.474084 9.474084
4: Argentina 2003 7.789883 9.474084
5: Argentina 2004 6.073994 9.474084
Upvotes: 1
Reputation: 30494
Maybe arrange
by year
, then after grouping by country
set Initial_Crime
to be the first
Crime
in the group.
library(tidyverse)
df %>%
arrange(year) %>%
group_by(country) %>%
mutate(Initial_Crime = first(Crime))
Output
country year Crime Initial_Crime
<chr> <int> <dbl> <dbl>
1 Argentina 2002 9.47 9.47
2 Argentina 2003 7.79 9.47
3 Argentina 2004 6.07 9.47
4 Albania 2016 2.74 2.74
5 Albania 2017 2.01 2.74
Upvotes: 1
Reputation: 1873
A data.table
solution
setDT(df)
df[, x := 1:.N, country
][x==1, initial_crime := crime
][, initial_crime := nafill(initial_crime, type = "locf")
][, x := NULL
]
Upvotes: 0