Reputation: 1
I'm working on a big data set of corporate account data in order to solve a classification problem if a firm goes bankrupt or not.
The dataset contains a variable liquid
which states the year when the liquidation started. This variable is omnipresent in every year of observation given that the firm actually starts liquidation. Otherwise it is zero. Usually, liquid
is larger than the last year of observation. So, there are no observations of the corporate data in the year the firm starts liquidation. Sometimes, there are even longer gaps. For example, a firm starts liquidation in 2005 but the last observation of the financial ratios is in 2002.
A sample of the data might look like this:
Now, I want to create a new dummy called bankruptcy
. This should take the value of 1, if it is the last observation (with financial data) of a company that starts liquidation. You can see how bankruptcy
should look like in the table above. How do I proceed?
Upvotes: 0
Views: 113
Reputation: 889
there is probably a better way but how about
library(dplyr)
df <-structure(list(year = structure(c(1L, 2L, 3L, 2L, 3L, 4L, 5L, 2L, 3L), .Label = c("2000", "2001", "2002", "2003", "2004"), class = "factor"), liquid = structure(c(2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L), .Label = c("2003", "2005"), class = "factor"), company = structure(c(1L, 1L,
1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), bankruptcy = c(0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("year", "liquid", "company", "bankruptcy"), row.names = c(NA, -9L), class = "data.frame")
df %>%
mutate(bankruptcy = 0) %>%
group_by(company) %>%
mutate(bankruptcy = c(bankruptcy[-n()], 1)) %>%
mutate(bankruptcy = ifelse(is.na(liquid),0,bankruptcy))
Upvotes: 0
Reputation: 819
If I understand you correctly from your desired output, you want bankruptcy
to take on a 1 in the highest value of liquid
for each company
.
h/t to @user6617454 for the structure.
df <-structure(list(year = structure(c(1L, 2L, 3L, 2L, 3L, 4L, 5L, 2L, 3L), .Label = c("2000", "2001", "2002", "2003", "2004"), class = "factor"), liquid = structure(c(2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L), .Label = c("2003", "2005"), class = "factor"), company = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"), ), .Names = c("year", "liquid", "company"), row.names = c(NA, -9L), class = "data.frame")
df$year <- as.numeric(as.character(df$year))
df$maxyear <- tapply(df$year, df$company, max)
df$bankruptcy <- ifelse(!is.na(df$liquid) & df$year == df$maxyear,
1,
0)
In that solution, bankruptcy
will take on a 1
when there was a liquid
value for the company and the particular row is the max for that company. If your exmaple isn't representative of your actual problem this might not work but this did produce the output in your attached image.
Upvotes: 0