Reputation: 1160
I've been breaking my head to understand how to do this but so far I couldn't find an easy solution. I have the following dataset:
Itin Origin Destination Passengers
1 A B 1
1 B C 1
2 A B 3
3 E B 10
4 A C 2
5 E B 4
What I'm trying to do is based on the Itin
variable, to create a path variable, while keeping the passengers variable.
The easiest way of understanding this is by seeing it as taking a normal flight with a scale somewhere. For example in Itin = 1
one passenger goes from A to B to C. The only thing that has to be kept is the Origin A
Destination B
, destination C
and passengers as it is, which is equal to 1. Just like on the example below.
Path Passengers
A-B-C 1
A-B 3
E-B 10
A-C 2
E-B 4
I've tried several options with group_by with dplyr, as it is often quicker than the base options, but I couldn't really get the result as on the second example with a new variable Path. I thought as well to use tidyr but I'm not really sure how it could help here. Any idea on how to do this?
Edit: As for the Path variable, it doesn't really matter if ends up as A-B-C
, or A,B,C
or A B C
as I will only look at the syntax.
Upvotes: 0
Views: 396
Reputation: 16277
EDIT A faster solution using data.table
df1<-read.table(text="Itin Origin Destination Passengers
1 A B 1
1 B C 1
2 A B 3
3 E B 10
4 A C 2
5 E B 4",header=TRUE, stringsAsFactors=FALSE)
library(data.table)
DT <-data.table(df1)
DT[,.(Passengers, Path = paste(Origin[1],paste(Destination, collapse = " "),
collapse = " ")), by=Itin]
Itin Passengers Path
1: 1 1 A B C
2: 1 1 A B C
3: 2 3 A B
4: 3 10 E B
5: 4 2 A C
6: 5 4 E B
Here's my orignal solution with dplyr
:
df1<-read.table(text="Itin Origin Destination Passengers
1 A B 1
1 B C 1
2 A B 3
3 E B 10
4 A C 2
5 E B 4",header=TRUE, stringsAsFactors=FALSE)
library(dplyr)
df1 %>%
group_by(Itin) %>%
summarise(Passengers=max(Passengers),
Path = paste(Origin[1],paste(Destination, collapse = " "),
collapse = " "))
# A tibble: 5 × 3
Itin Passengers Path
<int> <int> <chr>
1 1 1 A B C
2 2 3 A B
3 3 10 E B
4 4 2 A C
5 5 4 E B
Upvotes: 2
Reputation: 9705
Reading data:
read.table(textConnection("Itin Origin Destination Passengers
1 A B 1
1 B C 1
2 A B 3
3 E B 10
4 A C 2
5 E B 4"), header=T, stringsAsFactors=F) -> df
Using base R in this case:
Path <- lapply(unique(df$Itin), function(it) {
x <- subset(df, Itin==it)
c(x$Origin[1], x$Destination)
})
new_df <- unique(df[,c("Itin", "Passengers")])
new_df$Path <- Path
> new_df
Itin Passengers Path
1 1 1 A, B, C
3 2 3 A, B
4 3 10 E, B
5 4 2 A, C
6 5 4 E, B
Upvotes: 0