Reputation: 401

R - Compare values in two columns in different rows

I have a dataframe df as seen below with two features, a departing city and an arrival city. Every two rows information is stored about a going and a return flight.

  Departure Arrival
1    A          B
2    B          A
3    F          G
4    G          F
5    U          V
6    V          U
7    K          L
8    K          L

There is some inconsistency in the data where the same flight is repeated as it can be seen in the last two rows.

How is it possible to compare for every two rows the departure city of the first row and the arrival city of the second row, and keep the ones that are equal. The dataset is very big and of course a for-loop is not considered an option.

Thank you in advance.

Upvotes: 0

Answers (5)

CPak

Reputation: 13581

Try the following solution, if it works for you:

df[duplicated(paste0(df$Departure,df$Arrival))==F,]

Upvotes: 0

Lamia

Reputation: 3875

You could also do it this way:

right = rep(df[c(T,F),"Arrival"]==df[c(F,T),"Departure"],each=2)
df[right,]

This returns:

   Departure Arrival
1          A       B
2          B       A
3          F       G
4          G       F
5          U       V
6          V       U

Upvotes: 1

Olivia

Reputation: 814

This answer doesn't look for unique records, it specifically checks if a row is a duplicate of the row before.

Adding a new column with a 1 if the row has repeated:

 for(i in 2:length(df$Departure)){df$test[i]=ifelse(df$Departure[i] == df$Departure[i-1] & df$Arrival[i] == df$Arrival[i-1], 1,0)}

Loops can be slow though:

library(data.table)

df$test2 = ifelse(df$Departure == shift(df$Departure) & df$Arrival == shift(df$Arrival), 1,0)

Upvotes: 0

lmo

Reputation: 38500

Here is a method that compares the pairs of rows using head and tail to line them up.

# find Departures that match the Arrival in the next row
sames <- which(head(dat$Departure, -1) == tail(dat$Arrival, -1))
# keep pairs of rows that match, maintaining order with `sort`
dat[sort(unique(c(sames, (sames + 1)))),]
  Departure Arrival
1         A       B
2         B       A
3         F       G
4         G       F
5         U       V
6         V       U

Note that the two variables have to be character vectors, not factor variables. you can coerce them to character using as.character if necessary.

data

dat <-
structure(list(Departure = c("A", "B", "F", "G", "U", "V", "K", 
"K"), Arrival = c("B", "A", "G", "F", "V", "U", "L", "L")), .Names = c("Departure", 
"Arrival"), class = "data.frame", row.names = c("1", "2", "3", 
"4", "5", "6", "7", "8"))

Upvotes: 3

ike

Reputation: 312

So you just want unique flight paths? there are a number of ways to do this, I'd think the fastest would be with data.table, something like:

 library(data.table)
 df <- as.data.table(df)

 uniqueDf <- unique(df)

you can also use the duplicated function, something like

 df <- df[!duplicated(df), ]

should do nicely.

Upvotes: 2

R - Compare values in two columns in different rows

Answers (5)

Related Questions