Patrick
Patrick

Reputation: 1

Creating a dummy with different arguments in R

I'm working on a big data set of corporate account data in order to solve a classification problem if a firm goes bankrupt or not.

The dataset contains a variable liquid which states the year when the liquidation started. This variable is omnipresent in every year of observation given that the firm actually starts liquidation. Otherwise it is zero. Usually, liquid is larger than the last year of observation. So, there are no observations of the corporate data in the year the firm starts liquidation. Sometimes, there are even longer gaps. For example, a firm starts liquidation in 2005 but the last observation of the financial ratios is in 2002.

A sample of the data might look like this:

sample table

Now, I want to create a new dummy called bankruptcy. This should take the value of 1, if it is the last observation (with financial data) of a company that starts liquidation. You can see how bankruptcy should look like in the table above. How do I proceed?

Upvotes: 0

Views: 113

Answers (2)

CER
CER

Reputation: 889

there is probably a better way but how about

library(dplyr)

df <-structure(list(year = structure(c(1L, 2L, 3L, 2L, 3L, 4L, 5L,  2L, 3L), .Label = c("2000", "2001", "2002", "2003", "2004"), class = "factor"), liquid = structure(c(2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L), .Label = c("2003",  "2005"), class = "factor"), company = structure(c(1L, 1L, 
1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),  bankruptcy = c(0, 0, 0, 0, 0, 0, 0, 0, 0)), .Names = c("year", "liquid", "company", "bankruptcy"), row.names = c(NA, -9L), class = "data.frame")



df %>%
   mutate(bankruptcy = 0)  %>%
   group_by(company) %>%
   mutate(bankruptcy = c(bankruptcy[-n()], 1)) %>%
   mutate(bankruptcy = ifelse(is.na(liquid),0,bankruptcy))

Upvotes: 0

cparmstrong
cparmstrong

Reputation: 819

If I understand you correctly from your desired output, you want bankruptcy to take on a 1 in the highest value of liquid for each company.

h/t to @user6617454 for the structure.

df <-structure(list(year = structure(c(1L, 2L, 3L, 2L, 3L, 4L, 5L,  2L, 3L), .Label = c("2000", "2001", "2002", "2003", "2004"), class = "factor"), liquid = structure(c(2L, 2L, 2L, NA, NA, NA, NA, 1L, 1L), .Label = c("2003",  "2005"), class = "factor"), company = structure(c(1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L), .Label = c("A", "B", "C"), class = "factor"),  ), .Names = c("year", "liquid", "company"), row.names = c(NA, -9L), class = "data.frame")

df$year <- as.numeric(as.character(df$year))

df$maxyear <- tapply(df$year, df$company, max)
df$bankruptcy <- ifelse(!is.na(df$liquid) & df$year == df$maxyear, 
                        1, 
                        0)

In that solution, bankruptcy will take on a 1 when there was a liquid value for the company and the particular row is the max for that company. If your exmaple isn't representative of your actual problem this might not work but this did produce the output in your attached image.

Upvotes: 0

Related Questions