Tom
Tom

Reputation: 2341

Creating a variable by group for sample data

I have a sample data base (which I did not make myself) as follows:

panelID= c(1:50)
year= c(2005, 2010)
country = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
n <- 2
library(data.table)
set.seed(123)
DT <- data.table(   country = rep(sample(country, length(panelID), replace = T), each = n),
                    year = c(replicate(length(panelID), sample(year, n))),
DT [, uniqueID := .I]                                                         # Creates a unique ID     
DT[DT == 0] <- NA 
DT$sales[DT$sales< 0] <- NA 
DT <- as.data.frame(DT)

I am always struggling when I want to create a new variable which has to meet certain conditions.

I would like to create a tax rate for my sample database. The tax rate has to be the same per country-year, between 10% and 40% and not more than 5% apart per country.

I cannot seem to figure out how to do it. It would be great if someone could point me in the right direction.

Upvotes: 0

Views: 50

Answers (1)

Martin Gal
Martin Gal

Reputation: 16978

Not 100 % sure what you are looking for. You could use dplyr:

DT %>%
  group_by(country) %>%
  mutate(base_rate = as.integer(runif(1, 12.5, 37.5))) %>%
  group_by(country, year) %>%
  mutate(tax_rate = base_rate + as.integer(runif(1,-2.5,+2.5)))

which returns

# A tibble: 100 x 6
# Groups:   country, year [20]
   country  year uniqueID sales base_rate tax_rate
   <chr>   <dbl>    <int> <lgl>     <int>    <int>
 1 C        2005        1 NA           26       26
 2 C        2010        2 NA           26       26
 3 C        2010        3 NA           26       26
 4 C        2005        4 NA           26       26
 5 J        2005        5 NA           21       21
 6 J        2010        6 NA           21       20
 7 B        2010        7 NA           20       20
 8 B        2005        8 NA           20       22
 9 F        2010        9 NA           26       26
10 F        2005       10 NA           26       26

I first created a random base_rate per country and then a random tax_rate per country and year.

I used integer but you could easily replace them with real percentage values.

Upvotes: 2

Related Questions