Reputation: 334
I have a dataframe in R with approximately 500 rows and the following columns x y z, and I have a matrix with a single column and 3 three rows, calls them a b c. I want to filter the first dataframe based on the values in the matrix rows. Basically, I want to find the row in my first dataset that has values in column x that are closest to the value in row a of the matrix, values in column y are closest to the value of row b in the matrix, and values in column z are closest to the values of row z of the matrix. I feel that this should be quite straightforward, but I must be missing something here.
Basically, I just need to return the row in the dataframe with the values that match closest the data in the matrix so I can determine which row is most representative of the matrix.
Here's an example:
x <- c(52, -36, 45, 756, 12, 23, 45)
y <- c(34, 56, 68, 23, -4, 2, 23)
z <- c(-1, 2, 5, 4, 6, -4, 3)
df <- data.frame(x, y, z)
vector <- c(-60,20,7)
I want to filter df
based on the values in vector
so that I return a single row that has values across the three columns that closest matches the vector.
Upvotes: 0
Views: 182
Reputation: 388817
One way would be to subtract the dataframe with vector
meaning subtract column 1 with vector[1]
, column 2 with vector[2]
and so on, take absolute value, rowwise sum the differences and select the row that has minimum value.
df[which.min(rowSums(abs(sweep(df, 2, vector, `-`)))), ]
Upvotes: 3