Simon Harmel
Simon Harmel

Reputation: 1489

Find rows that have closest columns' values to a specific row in a data.frame

Imagine we have one row in the data below as our reference (row # 116).

How can I find any other rows in this data whose columns' values are the same or the closest (if column value is numerical, lets say up to +/- 3 is an acceptable match) to the columns' values of this reference row?

For example, if the column value for variable prof in the reference row is beginner, we want to find another row whose value for prof is also beginner.

Or if the column value for variable study_length in the reference row is 5, we want to find another row whose value for study_length is also 5 +/- 3 and so on.

Is it possible to set up a function do this in R?

data <- read.csv("https://raw.githubusercontent.com/hkil/m/master/wcf.csv")[-c(2:6,12,17)])

reference <- data[116,]

############################# YOUR POSSIBLE ANSWER:

foo <- function(data = data, reference_row = 116, tolerance_for_numerics = 3) {

# your solution


}

# Example of use:

foo()

Upvotes: 0

Views: 109

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76402

Here is a solution.

foo <- function(x = data, reference_row = 116, tolerance_for_numerics = 3) {
  # which columns are numeric
  i <- sapply(x, is.numeric)
  reference <- x[reference_row, ]
  # numeric columns are within a range
  num <- mapply(\(y, ref, tol) {
    y >= ref - tol & y <= ref + tol
  }, data[i], reference[i], MoreArgs = list(tol = 3))
  # other columns must match exactly (?)
  other <- mapply(\(y, ref) {
    y == ref
  }, data[!i], reference[!i])
  which(rowSums(cbind(other, num)) == ncol(data))
}

data <- read.csv("https://raw.githubusercontent.com/hkil/m/master/wcf.csv")[-c(2:6,12,17)]

# Example of use:
foo()
#> [1] 112 114 116

Created on 2022-08-13 by the reprex package (v2.0.1)

Upvotes: 1

Related Questions