Carlos
Carlos

Reputation: 41

R: How to get the percentage change from two different columns

I am trying to resolve this using R but I can't seem to find the correct solution

This is how my data looks:

Carrier Station Month   TYSeats LYSeats
AAL BSB 6   10560   10560
AAL BSB 7   10912   10912
AAL BSB 8   10560   9328
AAL BSB 9   9152    7392
AAL BSB 10  9328    9152
AAL BSB 11  8976    10384
AAL BSB 12  10208   10912
AAL CNF 6   12122   12644
AAL CNF 7   12958   13516
AAL CNF 8   10868   10138
AAL CNF 9   5434    5614
AAL CNF 10  5434    7630
AAL CNF 11  8987    9241
AAL CNF 12  12122   12958

I am using this code:

aggregate((TYSeats-LYSeats)/LYSeats~Carrier+Station,data=df,FUN=mean)

The solution I would have expected would have looked something like this (which is (sum(TYSeats) - sum(LYSeats)) over sum(LYSeats)):

1              AAL  BSB                 0.015385  
2              AAL  CNF                -0.053191

But I am getting this instead (it is averaging each operation for each month)

1              AA     BSB                0.0270417328
2              AA     CNF               -0.0603483997

Is there a way to accomplish what I need in a simple line/command?

Thanks!

Upvotes: 2

Views: 1468

Answers (5)

Seyma Kalay
Seyma Kalay

Reputation: 2861

df.new <- group_by(Carrier, Station) %>%
     mutate(Max = max(TYSeats, LYSeats),
     Min = min(TYSeats, LYSeats),
     Diff.per = Max/Min -1)

you can see the positive percantage changes

Upvotes: 0

rafa.pereira
rafa.pereira

Reputation: 13827

A simple and fast data.table solution.

library(data.table)

setDT(df)

df[ , .(PercentChange = sum(TYSEATs -LYSeats)/sum(LYSEATs)) , by =  .(Carrier, Station) ]

Upvotes: 1

akrun
akrun

Reputation: 887991

We can use dplyr

library(dplyr)
df1 %>% 
   group_by(Carrier, Station) %>% 
   summarise(PercentChange = (sum(TYSeats) - sum(LYSeats))/sum(LYSeats))
# Carrier Station PercentChange
#    <chr>   <chr>         <dbl>
#1     AAL     BSB    0.01538462
#2     AAL     CNF   -0.05319134

Upvotes: 2

Bryan Goggin
Bryan Goggin

Reputation: 2489

Probably worth noting that if is actually the percentage you are after, you should multiply by 100. Using @Psidom's code:

ddply(df, .(Carrier, Station), summarise, 
  PerentChange = ((sum(TYSeats) - sum(LYSeats))/sum(LYSeats)*100))

  Carrier Station PerentChange
 AAL     BSB     1.538462
 AAL     CNF    -5.319134

For example, 1/4 is 25%, but

> 1/4
[1] 0.25

Upvotes: 0

akuiper
akuiper

Reputation: 215137

You can also use the ddply function from plyr package:

library(plyr)
ddply(df, .(Carrier, Station), summarise, 
      PerentChange = (sum(TYSeats) - sum(LYSeats))/sum(LYSeats))

  Carrier Station PerentChange
1     AAL     BSB   0.01538462
2     AAL     CNF  -0.05319134

Upvotes: 1

Related Questions