Paul
Paul

Reputation: 1117

sparse partial least square regression

I have two data-sets as follows:

     http://www.filedropper.com/dataa_1 ## DataA
     http://www.filedropper.com/datab   ## DataB

In dataA, we have 42 rows and 8 columns and in DataB 42 rows and 6 columns. We wanted to do CCA and sPLS using both of these data in R. But my question here is when we look at DataB, always every eleven rows will have the same values. Will this affect the results or cause a discrepancy in either the CCA or sPLS?

Upvotes: 0

Views: 179

Answers (1)

Vincent Guillemot
Vincent Guillemot

Reputation: 3429

After looking at block B, it looks like the variables are discrete.

It is not a (technical) problem to use such variables in PLS or CCA, but it poses statistical "challenges": the use of bootstap or jackknife may be required to go further into the statistical interpretation of the results.

You should also ask yourself if this "discrete" representation is accurate for your data. It may be wrong if the original variables are categorical, in which case you should use dummy variables.

Upvotes: 1

Related Questions