Sam Blackburn
Sam Blackburn

Reputation: 13

Box-Cox Tranformation Error: object 'x' not found

hopefully a relatively easy one for those more experienced than me!

Trying to perform a Box-Cox transformation using the following code:

fit <- lm(ABOVEGROUND_BIO ~ TREATMENT * P_LEVEL, data = MYCORRHIZAL_VARIANCE)
bc <- boxcox(fit)
lambda<-with(bc, x[which.max(y)])
MYCORRHIZAL_VARIANCE$bc <- ((x^lambda)-1/lambda)
boxplot(bc ~ TREATMENT * P_LEVEL, data = MYCORRHIZAL_VARIANCE)

however when I run it, I get the following error message:

Error: object 'x' not found. (on line 4)

For context, here's the str of my dataset:

Classes ‘spec_tbl_df’, ‘tbl_df’, ‘tbl’ and 'data.frame':    24 obs. of  14 variables:
 $ TREATMENT             : Factor w/ 2 levels "Mycorrhizal",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ P_LEVEL               : Factor w/ 2 levels "Low","High": 1 1 1 1 1 1 2 2 2 2 ...
 $ REP                   : int  1 2 3 4 5 6 1 2 3 4 ...
 $ ABOVEGROUND_BIO       : num  7.5 6.8 5.3 6 6.7 7 12 12.7 12 10.2 ...
 $ BELOWGROUND_BIO       : num  3 2.4 2 4 2.7 3.6 7.9 8.8 9.5 9.2 ...
 $ ROOT_SHOOT            : num  0.4 0.35 0.38 0.67 0.4 0.51 0.66 0.69 0.79 0.9 ...
 $ ROOT_SHOOT.log        : num  -0.916 -1.05 -0.968 -0.4 -0.916 ...
 $ ABOVEGROUND_BIO.log   : num  2.01 1.92 1.67 1.79 1.9 ...
 $ ABOVEGROUND_BIO.sqrt  : num  2.74 2.61 2.3 2.45 2.59 ...
 $ ABOVEGROUND_BIO.cubert: num  1.96 1.89 1.74 1.82 1.89 ...
 $ BELOWGROUND_BIO.log   : num  1.099 0.875 0.693 1.386 0.993 ...
 $ BELOWGROUND_BIO.sqrt  : num  1.73 1.55 1.41 2 1.64 ...
 $ BELOWGROUND_BIO.cubert: num  1.44 1.34 1.26 1.59 1.39 ...
 $ TOTAL_BIO             : num  10.5 9.2 7.3 10 9.4 10.6 19.9 21.5 21.5 19.4 ...
 - attr(*, "spec")=
  .. cols(
  ..   TREATMENT = col_factor(levels = c("Mycorrhizal", "Non-mycorrhizal"), ordered = FALSE, include_na = FALSE),
  ..   P_LEVEL = col_factor(levels = c("Low", "High"), ordered = FALSE, include_na = FALSE),
  ..   REP = col_integer(),
  ..   ABOVEGROUND_BIO = col_number(),
  ..   BELOWGROUND_BIO = col_number(),
  ..   ROOT_SHOOT = col_number()
  .. )

I understand there's no variable named bc in the MYCORRHIZAL_VARIANCE dataset, but I'm just following basic instructions given to me on performing a Box-Cox, and I guess I'm confused as to what 'x' should actually be denoted as, since I thought 'x' was being defined in line 3? Any suggestions as to how to fix this error?

Thanks in advance!

Upvotes: 1

Views: 577

Answers (1)

Gregor Thomas
Gregor Thomas

Reputation: 145805

I thought 'x' was being defined in line 3?

Line 3 is lambda<-with(bc, x[which.max(y)]). It doesn't define x, it defines lambda. It does use x, which it looks for within the bc environment. If you're using boxcox() from the MASS package, bc should indeed include x and y components, so bc$x shouldn't give you the same error message. I'd expect an error about the replacement lengths. Because...

bc$x are the potential lambda values tried by boxcox - you're using the default seq(-2, 2, 1/10), and it would be an unlikely coincidence if your data had a multiple of 41 rows needed to not give an error when assigning 41 values to a new column.

Line 3 picks out the lambda value that maximizes the likelihood, so you shouldn't need the rest of the values in bc ever again. I'd expect you to use that lambda values to transform your response variable, as that's what the Box Cox transformation is for. ((x^lambda)-1/lambda) doesn't make any statistical or programmatic sense. Use this instead:

MYCORRHIZAL_VARIANCE$bc <- (MYCORRHIZAL_VARIANCE$ABOVEGROUND_BIO ^ lambda - 1) / lambda

(Note that I also corrected the parentheses. You want (y ^ lambda - 1) / lambda, not (y ^ lambda) - 1 / lambda.)

Upvotes: 0

Related Questions