Reputation: 1489
Imagine we have one row in the data
below as our reference
(row # 116).
How can I find any other rows in this data
whose columns' values are the same or the closest (if column value is numerical, lets say up to +/- 3 is an acceptable match) to the columns' values of this reference
row?
For example, if the column value for variable prof
in the reference
row is beginner
, we want to find another row whose value for prof
is also beginner
.
Or if the column value for variable study_length
in the reference
row is 5
, we want to find another row whose value for study_length
is also 5 +/- 3
and so on.
Is it possible to set up a function do this in R?
data <- read.csv("https://raw.githubusercontent.com/hkil/m/master/wcf.csv")[-c(2:6,12,17)])
reference <- data[116,]
############################# YOUR POSSIBLE ANSWER:
foo <- function(data = data, reference_row = 116, tolerance_for_numerics = 3) {
# your solution
}
# Example of use:
foo()
Upvotes: 0
Views: 109
Reputation: 76402
Here is a solution.
foo <- function(x = data, reference_row = 116, tolerance_for_numerics = 3) {
# which columns are numeric
i <- sapply(x, is.numeric)
reference <- x[reference_row, ]
# numeric columns are within a range
num <- mapply(\(y, ref, tol) {
y >= ref - tol & y <= ref + tol
}, data[i], reference[i], MoreArgs = list(tol = 3))
# other columns must match exactly (?)
other <- mapply(\(y, ref) {
y == ref
}, data[!i], reference[!i])
which(rowSums(cbind(other, num)) == ncol(data))
}
data <- read.csv("https://raw.githubusercontent.com/hkil/m/master/wcf.csv")[-c(2:6,12,17)]
# Example of use:
foo()
#> [1] 112 114 116
Created on 2022-08-13 by the reprex package (v2.0.1)
Upvotes: 1