Reputation: 687
Let's say I have data on two firms that are each expanding to new cities. I have the dates of each additional store added in each city:
df <- data.frame(
firm = c(rep(1,4), rep(2,4)),
date = as.Date(c('2017-01-01', '2017-03-01', '2017-05-01',
'2017-06-01', '2017-02-01', '2017-04-01',
'2017-05-01', '2017-06-01')),
city = c('New York', 'DC', 'New York', 'Atlanta', 'DC', 'DC',
'Chicago', 'Atlanta'),
numStores = c(1, 1, 2, 1, 1, 2, 1, 1))
I want to create a column of integers that tells me the order that each firm entered each city based on the date
variable. The result should be this:
df$cityOrder = c(1,2,1,3,1,1,2,3)
Upvotes: 0
Views: 1262
Reputation: 47300
You can us factors
with tapply
in base R:
within(df,cityOrder <- unlist(tapply(city, firm, function(x)
as.numeric(factor(x,unique(x))))))
# firm date city numStores cityOrder
# 1 1 2017-01-01 New York 1 1
# 2 1 2017-03-01 DC 1 2
# 3 1 2017-05-01 New York 2 1
# 4 1 2017-06-01 Atlanta 1 3
# 5 2 2017-02-01 DC 1 1
# 6 2 2017-04-01 DC 2 1
# 7 2 2017-05-01 Chicago 1 2
# 8 2 2017-06-01 Atlanta 1 3
(IF your dates are sorted for each firm as they are in this example)
Upvotes: 1
Reputation: 4378
A dplyr solution.
library(dplyr)
# create a summary by firm by city
df_summary <- df %>%
# sequence by earliest date by firm and city
group_by(firm, city) %>%
summarize(min_date = min(date)) %>%
# number (using row_number) *within* the firm group, arranged by date
arrange(firm, min_date) %>%
group_by(firm) %>%
mutate(cityOrder = row_number()) %>%
# drop the date column
select(firm, city, cityOrder)
print(df_summary)
# A tibble: 6 x 3
# Groups: firm [2]
# firm city cityOrder
# <dbl> <fct> <int>
# 1 1. New York 1
# 2 1. DC 2
# 3 1. Atlanta 3
# 4 2. DC 1
# 5 2. Chicago 2
# 6 2. Atlanta 3
## join the summary city order to the original
df_new <- df %>%
left_join(df_summary, by = c(firm = "firm", city = "city"))
print (df_new$cityOrder)
# [1] 1 2 1 3 1 1 2 3
Upvotes: 1
Reputation: 18425
Using dplyr
...
library(dplyr)
df %>% left_join(df %>% filter(numStores==1) %>%
group_by(firm) %>%
mutate(cityOrder=order(date)) %>%
select(-numStores,-date))
firm date city numStores cityOrder
1 1 2017-01-01 New York 1 1
2 1 2017-03-01 DC 1 2
3 1 2017-05-01 New York 2 1
4 1 2017-06-01 Atlanta 1 3
5 2 2017-02-01 DC 1 1
6 2 2017-04-01 DC 2 1
7 2 2017-05-01 Chicago 1 2
8 2 2017-06-01 Atlanta 1 3
Upvotes: 4
Reputation:
One way:
library(dplyr)
first_entry <- df %>%
group_by(firm, city) %>%
summarize(first = min(date)) %>%
group_by(firm) %>%
mutate(order = order(first)) %>%
select(-first)
left_join(df, first_entry)
firm date city numStores order
1 1 2017-01-01 New York 1 1
2 1 2017-03-01 DC 1 2
3 1 2017-05-01 New York 2 1
4 1 2017-06-01 Atlanta 1 3
5 2 2017-02-01 DC 1 1
6 2 2017-04-01 DC 2 1
7 2 2017-05-01 Chicago 1 2
8 2 2017-06-01 Atlanta 1 3
Upvotes: 2