Reputation: 2341
I have a sample data base (which I did not make myself) as follows:
panelID= c(1:50)
year= c(2005, 2010)
country = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J")
n <- 2
library(data.table)
set.seed(123)
DT <- data.table( country = rep(sample(country, length(panelID), replace = T), each = n),
year = c(replicate(length(panelID), sample(year, n))),
DT [, uniqueID := .I] # Creates a unique ID
DT[DT == 0] <- NA
DT$sales[DT$sales< 0] <- NA
DT <- as.data.frame(DT)
I am always struggling when I want to create a new variable which has to meet certain conditions.
I would like to create a tax rate for my sample database. The tax rate has to be the same per country-year, between 10% and 40% and not more than 5% apart per country.
I cannot seem to figure out how to do it. It would be great if someone could point me in the right direction.
Upvotes: 0
Views: 50
Reputation: 16978
Not 100 % sure what you are looking for. You could use dplyr
:
DT %>%
group_by(country) %>%
mutate(base_rate = as.integer(runif(1, 12.5, 37.5))) %>%
group_by(country, year) %>%
mutate(tax_rate = base_rate + as.integer(runif(1,-2.5,+2.5)))
which returns
# A tibble: 100 x 6
# Groups: country, year [20]
country year uniqueID sales base_rate tax_rate
<chr> <dbl> <int> <lgl> <int> <int>
1 C 2005 1 NA 26 26
2 C 2010 2 NA 26 26
3 C 2010 3 NA 26 26
4 C 2005 4 NA 26 26
5 J 2005 5 NA 21 21
6 J 2010 6 NA 21 20
7 B 2010 7 NA 20 20
8 B 2005 8 NA 20 22
9 F 2010 9 NA 26 26
10 F 2005 10 NA 26 26
I first created a random base_rate
per country and then a random tax_rate
per country and year.
I used integer but you could easily replace them with real percentage values.
Upvotes: 2