Reputation: 323
This is more of a general question that I haven't been able to find. I am trying to find the correlation between 2 data sets, with the goal of matching them with a certain correlation percentage. They won't be exact matches, but will mostly be within 1%, though there will likely be some outliers. For example, every 100th point might be off by 5%, possibly more.
I am also trying to find instances where a data set might match another but have a different magnitude. For example, if you multiplied all of the data by a multiplier, you would get a match. It obviously wouldn't make sense to loop through a ton of possible multipliers. I'm contemplating trying to match positive and negative slopes as either +1/-1 as the slope would not work. Though, this would not work in some instances as the data is very granular and thus it might match the shape of the data but if you zoom in the slopes would be off.
Are there any built in functions in R? I don't have a statistical background and my searches came up with mostly how to handle a single data set and outliers in those.
Upvotes: 0
Views: 3624
Reputation: 1029
For a basic Pearson, Spearman, or Kendall correlation, you can use the cor() function:
x <- c(1, 2, 5, 7, 10, 15)
y <- c(2, 4, 6, 9, 12, 13)
cor(x, y, use="pairwise.complete.obs", method="pearson")
You're going to want to adjust the "use" and "method" options based on your data. Since you didn't provide the nature of your data, I can't give you any more specific guidance.
Upvotes: 1