FilipeTeixeira
FilipeTeixeira

Reputation: 1160

Regroup, summarise and combine variables

I've been breaking my head to understand how to do this but so far I couldn't find an easy solution. I have the following dataset:

Itin   Origin  Destination  Passengers
1      A       B            1
1      B       C            1
2      A       B            3
3      E       B            10
4      A       C            2
5      E       B            4

What I'm trying to do is based on the Itin variable, to create a path variable, while keeping the passengers variable. The easiest way of understanding this is by seeing it as taking a normal flight with a scale somewhere. For example in Itin = 1 one passenger goes from A to B to C. The only thing that has to be kept is the Origin A Destination B, destination C and passengers as it is, which is equal to 1. Just like on the example below.

Path    Passengers
A-B-C   1
A-B     3
E-B     10
A-C     2
E-B     4

I've tried several options with group_by with dplyr, as it is often quicker than the base options, but I couldn't really get the result as on the second example with a new variable Path. I thought as well to use tidyr but I'm not really sure how it could help here. Any idea on how to do this?

Edit: As for the Path variable, it doesn't really matter if ends up as A-B-C, or A,B,C or A B C as I will only look at the syntax.

Upvotes: 0

Views: 396

Answers (2)

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

EDIT A faster solution using data.table

df1<-read.table(text="Itin   Origin  Destination  Passengers
1      A       B            1
1      B       C            1
2      A       B            3
3      E       B            10
4      A       C            2
5      E       B            4",header=TRUE, stringsAsFactors=FALSE)

library(data.table)
DT <-data.table(df1)
DT[,.(Passengers, Path = paste(Origin[1],paste(Destination, collapse = " "),
                               collapse = " ")), by=Itin]

   Itin Passengers  Path
1:    1          1 A B C
2:    1          1 A B C
3:    2          3   A B
4:    3         10   E B
5:    4          2   A C
6:    5          4   E B

Here's my orignal solution with dplyr:

df1<-read.table(text="Itin   Origin  Destination  Passengers
1      A       B            1
1      B       C            1
2      A       B            3
3      E       B            10
4      A       C            2
5      E       B            4",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
df1 %>%
group_by(Itin) %>%
summarise(Passengers=max(Passengers),
          Path = paste(Origin[1],paste(Destination, collapse = " "),
                                collapse = " "))

# A tibble: 5 × 3
   Itin Passengers  Path
  <int>      <int> <chr>
1     1          1 A B C
2     2          3   A B
3     3         10   E B
4     4          2   A C
5     5          4   E B

Upvotes: 2

thc
thc

Reputation: 9705

Reading data:

read.table(textConnection("Itin   Origin  Destination  Passengers
1      A       B            1
1      B       C            1
2      A       B            3
3      E       B            10
4      A       C            2
5      E       B            4"), header=T, stringsAsFactors=F) -> df

Using base R in this case:

Path <- lapply(unique(df$Itin), function(it) {
    x <- subset(df, Itin==it)
    c(x$Origin[1], x$Destination)
})
new_df <- unique(df[,c("Itin", "Passengers")]) 
new_df$Path <- Path

> new_df
  Itin Passengers    Path
1    1          1 A, B, C
3    2          3    A, B
4    3         10    E, B
5    4          2    A, C
6    5          4    E, B

Upvotes: 0

Related Questions