Adam Warner
Adam Warner

Reputation: 1354

Matching Multiple Rows To Find A Value - R

I think that this is similiar but it is not the same as a previous question that I have asked here Pull specific rows

Here is the code that I am now working with:

City <- c("x","x","y","y","z","z")
Type <- c("a","b","a","b","a","b")
Value <- c(1,3,2,5,6,10)
cbind.data.frame(City,Type,Value)

Which produces:

    City Type Value
1    x    a     1
2    x    b     3
3    y    a     2
4    y    b     5
5    z    a     6
6    z    b    10

I want to do something similar as before but now if two different conditions must be met to pull a specific number. Lets say we had a matrix,

testmat <- matrix(c("x","x","y","a","b","b"),ncol=2)

Which looks like this:

    [,1] [,2]
[1,] "x"  "a" 
[2,] "x"  "b" 
[3,] "y"  "b" 

The desired outcome is

     [,1] [,2] [,3]
[1,] "x"  "a"   1 
[2,] "x"  "b"   3 
[3,] "y"  "b"   5

Another Question PLEASE ANSWER THIS PART

City <- c("x","x","x","x","y","y","x","z")
Type <- c("a","a","a","a","a","b","a","b")

Value <- c(1,3,2,5,6,10,11,15)

mat <- cbind.data.frame(City,Type,Value)
mat
testmat <- matrix(c("y","x","b","a"),ncol=2)
testmat <- data.frame(testmat)
testmat

test <- inner_join(mat,testmat,by = c("City"="X1", "Type"="X2"))

How come when I try to use the inner_join function it gives me a warning message. Here is the warning message that I get....

In inner_join_impl(x, y, by$x, by$y) : joining factors with different levels, coercing to character vector

This is the desired output, is...

    City Type Value
1    y    b    10
2    x    a     1
3    x    a     3
4    x    a     2
5    x    a     5
6    x    a    11

but it is producing...

    City Type Value
1    x    a     1
2    x    a     3
3    x    a     2
4    x    a     5
5    y    b    10
6    x    a    11

I want the inner_join function to produce the values in which they are presented first in the testmat, as shown above. So if since City "y" of type "b" comes first in the testmat I want it to come first in the values for "test"

Upvotes: 0

Views: 569

Answers (3)

MarkusN
MarkusN

Reputation: 3223

Answer to second part: The warning states, that you try to join on two factors with different levels. Therefor, the variables are coerced into "character" before joining, theres no problem with that. As Mostafa Rezaei mentioned in his answer R is coercing factors from character-vectors when creating a dataframe. Usually it's best to leave characters:

mat <- data.frame(City,Type,Value, stringsAsFactors=F)
testmat <- data.frame(testmat, stringsAsFactors=F)


Concerning your real question:

The order of the result of a join is not defined. If order is crucial to you, you can use an additional sorting variable:

mat %>% 
mutate(rn = row_number()) %>%
semi_join(testmat, by = c("City"="X1", "Type"="X2")) %>%
arrange(rn)

btw: I think your looking for an semi_join rather than an inner_join, read the help file for differences.

Upvotes: 0

Mostafa Rezaei
Mostafa Rezaei

Reputation: 629

The warning is because R treats string vectors as factor type. you can change this behaviour by running the following code at the start of your script:

 options(stringsAsFactors = FALSE)

Upvotes: 0

Adam Warner
Adam Warner

Reputation: 1354

The solution is to just switch the order of testmat and mat, like so..

test <- inner_join(testmat,mat,by = c("X1"="City", "X2"="Type"))

I find it interesting that the order of the by parameter needs to be in the same order of the data frames being passed throught the innerjoin function.

Upvotes: 2

Related Questions