Sabrina
Sabrina

Reputation: 21

How can I subset a data frame with a second data frame

I have two sets of data that were imported. The second set of data has only one column which corresponds with 44 out of 86 of the rows in the first data set. How can I form a new data set which only involves the data of the rows that appear in both data sets. The names of the columns are different.

I am not in a position to publish the data being use, there for code below is along the same lines but not exactly the same. However, as I am trying to select 44 rows choosing the lines isn't an option.

df1:

Fruit        Circumference      Weight 
Apple        10                 20
Orange       13                 30
Pear         10                 21
Kiwi         7                  15
Cherry       3                  10
Strawberry   4                  13
Blueberry    2                  5

df2:

 Fruit Name 
 Orange 
 Cherry 
 Blueberry 

New data:

Fruit        Circumference      Weight 
Orange       13                 30
Cherry       3                  10
Blueberry    2                  5

Upvotes: 0

Views: 92

Answers (2)

akrun
akrun

Reputation: 886958

We can use inner_join. It also have an option to check with only selected columns with by argument

library(dplyr)
inner_join(df1, df2)

Or another option is intersect from dplyr

intersect(df1, df2)

With the posted data it wold be

inner_join(df1, df2, by = c("Fruit" = "Fruit Name"))

Upvotes: 1

Rui Barradas
Rui Barradas

Reputation: 76402

Here are three base R ways.
Note that the second data.frame column name has an embedded space in it. The column name must, therefore, be between backticks in the first two ways and in the 3rd arguments by.x and by.y are used.

df1[df1$Fruit %in% df2$`Fruit Name`, ]
#      Fruit Circumference Weight
#2    Orange            13     30
#5    Cherry             3     10
#7 Blueberry             2      5


df1[match(df2$`Fruit Name`, df1$Fruit), ]
#      Fruit Circumference Weight
#2    Orange            13     30
#5    Cherry             3     10
#7 Blueberry             2      5


merge(df1, df2, by.x = "Fruit", by.y = "Fruit Name")
#      Fruit Circumference Weight
#1 Blueberry             2      5
#2    Cherry             3     10
#3    Orange            13     30

Data

df1 <- read.table(text = "
Fruit        Circumference      Weight 
Apple        10                 20
Orange       13                 30
Pear         10                 21
Kiwi         7                  15
Cherry       3                  10
Strawberry   4                  13
Blueberry    2                  5
", header = TRUE)

df2 <- read.table(text = "
 'Fruit Name'
 Orange 
 Cherry 
 Blueberry 
", header = TRUE, check.names = FALSE)

new <- read.table(text = "
Fruit        Circumference      Weight 
Orange       13                 30
Cherry       3                  10
Blueberry    2                  5
", header = TRUE)

Upvotes: 2

Related Questions