user1680636
user1680636

Reputation: 51

R-sq values, linear regression of several trends within one dataset

I am running into a sticky spot trying to solve for variance accounted for by trend several times within a single data set.....

My data is structured like this

x <- read.table(text = "
STA YEAR    VALUE
a   1968    457
a   1970    565
a   1972    489
a   1974    500
a   1976    700
a   1978    650
a   1980    659
b   1968    457
b   1970    565
b   1972    350
b   1974    544
b   1976    678
b   1978    650
b   1980    690
c   1968    457
c   1970    565
c   1972    500
c   1974    600
c   1976    678
c   1978    670
c   1980    750 " , header = T)    

and I am trying to return something like this

STA  R-sq
a    n1
b    n2
c    n3

where n# is the corresponding r-squared value of the locations data in the original set....

I have tried

fit <- lm(VALUE ~ YEAR + STA, data = x) 

to give the model of yearly trend of VALUE for each individual station over the years data is available for VALUE, within the master data set....

Any help would be greatly appreciated.... I am really stumped on this one and I know it is just a familiarity with R problem.

Upvotes: 2

Views: 417

Answers (3)

FraNut
FraNut

Reputation: 686

    #first load the data.table package 
        library(data.table)
    #transform your dataframe to a datatable (I'm using your example)
        x<- as.data.table(x)
    #calculate all the metrics needed (r^2, F-distribution and so on) 
        x[,list(r2=summary(lm(VALUE~YEAR))$r.squared ,
        f=summary(lm(VALUE~YEAR))$fstatistic[1] ),by=STA]
           STA        r2         f
        1:   a 0.6286064  8.462807
        2:   b 0.5450413  5.990009
        3:   c 0.8806604 36.897258

Upvotes: 1

Ben
Ben

Reputation: 42313

To get r-squared for VALUE ~ YEAR for each group of STA, you can take this previous answer, modify it slightly and plug-in your values:

# assuming x is your data frame (make sure you don't have Hmisc loaded, it will interfere)
models_x <- dlply(x, "STA", function(df) 
     summary(lm(VALUE ~ YEAR, data = df)))

# extract the r.squared values
rsqds <- ldply(1:length(models_x), function(x) models_x[[x]]$r.squared)
# give names to rows and col
rownames(rsqds) <- unique(x$STA)
colnames(rsqds) <- "rsq"
# have a look
rsqds
        rsq
a 0.6286064
b 0.5450413
c 0.8806604

EDIT: following mnel's suggestion here are more efficient ways to get the r-squared values into a nice table (no need to add row and col names):

# starting with models_x from above
rsqds <- data.frame(rsq =sapply(models_x, '[[', 'r.squared'))

# starting with just the original data in x, this is great:
rsqds  <- ddply(x, "STA", summarize, rsq = summary(lm(VALUE ~ YEAR))$r.squared)

  STA       rsq
1   a 0.6286064
2   b 0.5450413
3   c 0.8806604

Upvotes: 2

Anthony Damico
Anthony Damico

Reputation: 6114

there's only one r-squared value, not three.. please edit your question

# store the output 
y <- summary( lm( VALUE ~ YEAR + STA , data = x ) )
# access the attributes of `y`
attributes( y )
y$r.squared
y$adj.r.squared
y$coefficients
y$coefficients[,1]

# or are you looking to run three separate
# lm() functions on 'a' 'b' and 'c' ..where this would be the first? 
y <- summary( lm( VALUE ~ YEAR , data = x[ x$STA %in% 'a' , ] ) )
# access the attributes of `y`
attributes( y )
y$r.squared
y$adj.r.squared
y$coefficients
y$coefficients[,1]

Upvotes: 0

Related Questions