Reputation: 153

how to to create new variable based on first entry of another variable

I have created df

  gvkey  year 
1  1004  1965    
2  1004  1966       
3  1004  1967      
4  1005  1950        
5  1005  1951      
...

gvkey = company id

My dataframe contains more than 1,200 unique gvkeys with different starting values for year.

I want to create a column in df called ipo_date which equals the first entry in the year column for a particular gvkey.

  gvkey  year  ipo_date
1  1004  1965    1965
2  1004  1966    1965
3  1004  1967    1965
4  1005  1950    1950
5  1005  1951    1950
...

Afterwards I want to delete all duplicate rows for ipo_date and delete the column year

  gvkey  ipo_date
1  1004   1965
2  1005   1950
...

Thank you so much!!!

Upvotes: 0

Answers (4)

Johnny

Reputation: 771

This should do the trick:

library(tidyverse)
df %>% 
  group_by(gvkey) %>% 
  mutate(ipo_date = min(year)) %>% 
  select(-year) %>% 
  distinct()

Output:

# A tibble: 2 x 2
# Groups:   gvkey [2]
  gvkey ipo_date
  <dbl>    <dbl>
1  1004     1965
2  1005     1950

Upvotes: 1

JAQuent

Reputation: 1224

Here is a one line solution:

library(plyr)
df <- data.frame(gvkey = c(1004, 1004, 1004, 1005, 1005),
                 year = c(1965, 1966, 1967, 1950, 1951))

df_agg <- ddply(df, c('gvkey'), summarise, ipo_date = min(year))
> df_agg
  gvkey ipo_date
1  1004     1965
2  1005     1950

Upvotes: 1

tushaR

Reputation: 3116

library(data.table)
df = as.data.table(df)
df = df[order(gvkey,year)]
df[,`:=`(ipo_date=min(year)),by=c("gvkey")][,`:=`(year=NULL)]
df = unique(df)

Upvotes: 1

Ronak Shah

Reputation: 389135

You could select the first row of every gvkey

library(dplyr)
df %>%  group_by(gvkey) %>%  slice(1L) %>%  rename_at(2, ~"ipo_date")

#  gvkey ipo_date
#  <int>    <int>
#1  1004     1965
#2  1005     1950

Using the same logic in data.table and base R, we can do

library(data.table)
setDT(df)[, .SD[1L], gvkey]

and

aggregate(year~gvkey, df, head, 1)

Upvotes: 1

how to to create new variable based on first entry of another variable

Answers (4)

Related Questions