Create a binary based on three other variables

Question

How can I programatically calculate desired_output?

The basic structure of my data frame is as follows:

airline<-c(0,0,1,0,0,1)
city1<-c('a','a','a','b','b','c')
city2<-c('b','c','d','c','d','d')
desired_output<-c(0,1,1,0,0,1)

mktdf<-data.frame(airline, city1, city2, desired_output)

The airline dummy indicates whether an airline flies between city1 and city2. In the case when it does not, I want to create a dummy that indicates that the airline still does fly from city1 and city2 (but, not between them).

For example, the airline does not fly BETWEEN a and b. It does however fly between a & d. On the other hand it never flies from city b. Thus the first row in desired_output =0.

In row 2 we observe 1 in desired_output. This is because, while we know the airline flies from city a and later we see it also flies from city (but again, not between them).

I'm happy to share any code I have written in attempting do solve this, though I was completely unsuccessful and I think it would just be distracting. However, broadly speaking I have tried using dpylr, looping and the transform function.

Onyambu · Accepted Answer

a=paste0(city1,city2)

b=combn(unlist(strsplit(a[!!(airline)],"")),2,paste0,collapse="")

a%in%b+0L
[1] 0 1 1 0 0 1


mktdf$desired1=a%in%b+0L
> mktdf
  airline city1 city2 desired_output desired1
1       0     a     b              0        0
2       0     a     c              1        1
3       1     a     d              1        1
4       0     b     c              0        0
5       0     b     d              0        0
6       1     c     d              1        1

Create a binary based on three other variables

Answers (2)

Related Questions