Reputation: 21
Suppose I have a data frame with three variables as below. How do I create a new variable that for each group takes the first observation of x
?
group year x
1 2000 3
1 2001 4
2 2000 1
2 2001 3
3 2000 5
3 2001 2
I want to to create something like this:
group year x y
1 2000 3 3
1 2001 4 3
2 2000 1 1
2 2001 3 1
3 2000 5 5
3 2001 2 5
Upvotes: 2
Views: 1015
Reputation: 226532
Set up data for example:
dd <- data.frame(group=rep(1:3,each=2),
year=rep(2000:2001,3),
x=c(3,4,1,3,5,2))
In base R, use ave()
. By default this finds the group average (rather than the first value), but we can use the FUN
argument to ask it to select the first value instead.
dd$y <- ave(dd$x, dd$group, FUN=function(x) x[1])
## or
dd <- transform(dd,y=ave(x, group, FUN=function(x) x[1])
(alternatively could use FUN=function(x) head(x,1)
)
In tidyverse,
library(dplyr)
dd <- dd %>%
group_by(group) %>%
mutate(y=first(x))
@lmo points out another alternative in comments:
library(data.table)
setDT(dd)[, y := first(x), by=group]
You can find nearly endless discussion of the relative merits of these three major approaches (base R vs tidyverse vs data.table) elsewhere (on StackOverflow and on the interwebs generally).
Upvotes: 3
Reputation: 1482
Here's yet another way, using base R:
dd <- data.frame(group = rep(1:3, each = 2),
year = rep(2000:2001, 3),
x = c(3, 4, 1, 3, 5, 2))
transform(dd, y = unsplit(tapply(x, group, function(x) x[1]), group))
Upvotes: 0
Reputation: 37661
A simple version in base R
### Your data
df = read.table(text="group year x
1 2000 3
1 2001 4
2 2000 1
2 2001 3
3 2000 5
3 2001 2",
header=TRUE)
df$y = aggregate(as.numeric(row.names(df)), list(df$group), min)$x[df$group]
df
group year x y
1 1 2000 3 1
2 1 2001 4 1
3 2 2000 1 3
4 2 2001 3 3
5 3 2000 5 5
6 3 2001 2 5
Upvotes: 0
Reputation: 1576
Using package plyr
:
df <- data.frame(group=c(1,1,2,2,3,3),
year=c(2000,2001,2000,2001,2000,2001),
x=c(3,4,1,3,5,2))
library(plyr)
ddply(df, .(group), transform, y=x[1])
Upvotes: 1