Bold Sambuu
Bold Sambuu

Reputation: 21

How to create a new variable that takes the first observation of a different variable?

Suppose I have a data frame with three variables as below. How do I create a new variable that for each group takes the first observation of x?

group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2    

I want to to create something like this:

group  year   x    y 
1      2000   3    3
1      2001   4    3
2      2000   1    1
2      2001   3    1
3      2000   5    5 
3      2001   2    5

Upvotes: 2

Views: 1015

Answers (4)

Ben Bolker
Ben Bolker

Reputation: 226532

Set up data for example:

dd <- data.frame(group=rep(1:3,each=2),
                year=rep(2000:2001,3),
                x=c(3,4,1,3,5,2))

In base R, use ave(). By default this finds the group average (rather than the first value), but we can use the FUN argument to ask it to select the first value instead.

dd$y <- ave(dd$x, dd$group, FUN=function(x) x[1])
## or
dd <- transform(dd,y=ave(x, group, FUN=function(x) x[1])

(alternatively could use FUN=function(x) head(x,1))

In tidyverse,

library(dplyr)
dd <- dd %>%
        group_by(group) %>%
        mutate(y=first(x))

@lmo points out another alternative in comments:

library(data.table)
setDT(dd)[, y := first(x), by=group]

You can find nearly endless discussion of the relative merits of these three major approaches (base R vs tidyverse vs data.table) elsewhere (on StackOverflow and on the interwebs generally).

Upvotes: 3

Patrick Perry
Patrick Perry

Reputation: 1482

Here's yet another way, using base R:

dd <- data.frame(group = rep(1:3, each = 2),
                 year = rep(2000:2001, 3),
                 x = c(3, 4, 1, 3, 5, 2))

transform(dd, y = unsplit(tapply(x, group, function(x) x[1]), group))

Upvotes: 0

G5W
G5W

Reputation: 37661

A simple version in base R

### Your data
df = read.table(text="group  year   x     
1      2000   3    
1      2001   4    
2      2000   1    
2      2001   3    
3      2000   5     
3      2001   2",
header=TRUE)

df$y = aggregate(as.numeric(row.names(df)), list(df$group), min)$x[df$group]
df
  group year x y
1     1 2000 3 1
2     1 2001 4 1
3     2 2000 1 3
4     2 2001 3 3
5     3 2000 5 5
6     3 2001 2 5

Upvotes: 0

user3640617
user3640617

Reputation: 1576

Using package plyr:

df <- data.frame(group=c(1,1,2,2,3,3), 
                 year=c(2000,2001,2000,2001,2000,2001), 
                 x=c(3,4,1,3,5,2))

library(plyr)
ddply(df, .(group), transform, y=x[1])

Upvotes: 1

Related Questions