Luther_Blissett
Luther_Blissett

Reputation: 327

How to do a left (or right) exclude join in R?

What is the R equivalent of this?

SELECT <select_list> 
FROM Table_A A
LEFT JOIN Table_B B
ON A.Key = B.Key
WHERE B.Key IS NULL

See https://www.codeproject.com/articles/33052/visual-representation-of-sql-joins - 'Left excluding JOIN'

I have two columns of data:

A
Amsterdam
Copenhagen
LA
Lisbon
London 
Madrid
New York
Paris
Rome
Stockholm

and

B
Amsterdam
Buenos Aires  
Copenhagen
LA
London 
Sydney
Tokyo

How do I return the lists:

In A ONLY
Lisbon
Madrid
New York
Paris
Rome
Stockholm

and

In B ONLY
Buenos Aires
Sydney
Tokyo

in R?

I've been trying and searching for solutions using the merge() function for about three hours.

Similarly, is there a way to do a left join in R that returns both columns?

For example, if I call:

A <- data.frame(A = c("Amsterdam",
                      "Copenhagen",
                      "LA",
                      "Lisbon",
                      "London", 
                      "Madrid",
                      "New York",
                      "Paris",
                      "Rome",
                      "Stockholm"))

B <- data.frame(B = c("Amsterdam",
                      "Buenos Aires",
                      "Copenhagen",
                      "LA",
                      "London", 
                      "Sydney",
                      "Tokyo"))

merge(A, B, by.x = "A", by.y = "B", all.x = T)

I get

            A
1   Amsterdam
2  Copenhagen
3          LA
4      Lisbon
5      London
6      Madrid
7    New York
8       Paris
9        Rome
10  Stockholm

But I want

            A   B
1   Amsterdam   Amsterdam
2  Copenhagen   Copenhagen
3          LA   LA
4      Lisbon   NULL
5      London   London 
6      Madrid   NULL
7    New York   NULL
8       Paris   NULL
9        Rome   NULL
10  Stockholm   NULL

Upvotes: 0

Views: 3599

Answers (5)

Carlos Silva
Carlos Silva

Reputation: 1

Sorry, my first comment on here so I dont know how to correctly format answer to look like R yet

Use anti_join(A,B) and the anti_join(B,A). The results:

anti_join(A,B)
Joining, by = "Y"
Y
1 Lisbon
2 Madrid
3 New York
4 Paris
5 Rome
6 Stockholm

anti_join(B,A)
Joining, by = "Y"
Y
1 Buenos Aires
2 Sydney
3 Tokyo

And to merge them two by matches:

aux<- inner_join(A,B)
aux2<- anti_join(B,A)
i<-1:nrow(A)
c<-ifelse(A$Y[i] %in% aux$Y, A$Y[i],"NA")
A<-cbind(A,c)
A<-rbind(A,data.frame(Y=rep("NA",nrow(anti_join(B,A))),c=aux2$Y))
A
Y c
1 Amsterdam Amsterdam
2 Copenhagen Copenhagen
3 LA LA
4 Lisbon NA
5 London London
6 Madrid NA
7 New York NA
8 Paris NA
9 Rome NA
10 Stockholm NA
11 NA Buenos Aires
12 NA Sydney
13 NA Tokyo

Upvotes: 0

Daniel Peter
Daniel Peter

Reputation: 1

For excluding the city mentioned in b from a. a is left table and b is right table. the below is similar to left excluding join. Left excluding join = Left join - inner join.

a <- data.frame(a);
colnames(a) <- city;

b <- data.frame(b);
colnames(b) <- city;
b$indicator <- c("indicator");

a_excl_b <- merge(x = a, y = b, all = FALSE, all.x = TRUE, all.y = FALSE);
a_excl_b <- subset(a_excl_b, is.na(indicator));
a_excl_b <- data.frame(a_excl_b$city);
colnames(a_excl_b ) <- c("city");

Upvotes: 0

s_baldur
s_baldur

Reputation: 33488

You could do something like the following:

df <- data.frame(A, B = "NULL", stringsAsFactors = FALSE)
isinB <- df$A %in% B$B
df[isinB, "B"] <- as.character(df[isinB, "A"])


print(df)
            A          B
1   Amsterdam  Amsterdam
2  Copenhagen Copenhagen
3          LA         LA
4      Lisbon       NULL
5      London     London
6      Madrid       NULL
7    New York       NULL
8       Paris       NULL
9        Rome       NULL
10  Stockholm       NULL

Upvotes: 1

akrun
akrun

Reputation: 887068

We create a column of 'A' in 'B' and then do the merge

merge(A, transform(B, A = B), all.x = TRUE)
#            A          B
#1   Amsterdam  Amsterdam
#2  Copenhagen Copenhagen
#3          LA         LA
#4      Lisbon       <NA>
#5      London     London
#6      Madrid       <NA>
#7    New York       <NA>
#8       Paris       <NA>
#9        Rome       <NA>
#10  Stockholm       <NA>

Upvotes: 1

G5W
G5W

Reputation: 37641

## A only
setdiff(A,B)
[1] "Lisbon"    "Madrid"    "New York"  "Paris"     "Rome"      "Stockholm"

## B only
setdiff(B,A)
[1] "Buenos Aires" "Sydney"       "Tokyo" 

Upvotes: 2

Related Questions