wespiserA
wespiserA

Reputation: 3168

how to select rows from a dataframe1 in R where dataframe$1column is found somewhere in dataframe2$column

I need to create a new dataframe from rows from dataframe1, such that the value of dataframe1$column is a value found in dataframe2$colum

the dataframes are:

y <- "name,number,lunch  
joe,2,peaches  
steve,5,hotdog  
jon,7,clamroll  
nick,11,sloppyJoe"


x <- "number,office  
1,1b  
2,1a  
3,2s  
4,4d  
5,f4  
6,f4  
7,h3  
8,g3  
9,j7  
10,d3 
11,jk"  

df1 <- read.csv(textConnection(df1), header=TRUE, sep=",", stringsAsFactors=FALSE)  
df2 <- read.csv(textConnection(df2), header=TRUE, sep=",", stringsAsFactors=FALSE)

I have tried:

df3 <- df1[which(df1$number == df2$number), ]

to no avail. How do I properly do this in R? I could write a perl script, but I have about 100 of these sets and don't want to create more temp files.

Upvotes: 1

Views: 12217

Answers (2)

Chase
Chase

Reputation: 69201

Joris' answer is spot on. The merge() command can also be useful for this type of stuff. If you are familiar with SQL joins, you can draw parallels between most of the options in merge() and the different join operations.

#Inner join
> merge(df1,df2)
  number office  name      lunch
1      2   1a     joe  peaches  
2      5   f4   steve   hotdog  
3      7   h3     jon clamroll  
4     11     jk  nick  sloppyJoe

#Right join:
> merge(df1,df2, all.x = TRUE)
   number office  name      lunch
1       1   1b    <NA>       <NA>
2       2   1a     joe  peaches  
3       3   2s    <NA>       <NA>
4       4   4d    <NA>       <NA>
5       5   f4   steve   hotdog  
6       6   f4    <NA>       <NA>
7       7   h3     jon clamroll  
8       8   g3    <NA>       <NA>
9       9   j7    <NA>       <NA>
10     10    d3   <NA>       <NA>
11     11     jk  nick  sloppyJoe

Upvotes: 1

Joris Meys
Joris Meys

Reputation: 108543

again, the %in% trick :

> df1[df1$number %in% df2$number,]
   number office
2       2     1a
5       5     f4
7       7     h3
11     11     jk

For what it's worth, you can easily just do a merge if you want to combine them. In this case I'd say that's the cleanest solution : gives you every office of the occuring employees, and matches them :

> merge(df1,df2)
  number office  name     lunch
1      2     1a   joe   peaches
2      5     f4 steve    hotdog
3      7     h3   jon  clamroll
4     11     jk  nick sloppyJoe

Check the help files of merge for more options, you can do a whole lot with it.

Upvotes: 5

Related Questions