c0ba1t
c0ba1t

Reputation: 241

Summarise and lm model error

When trying to create a table of statistics I am running into an issue with the summary.lm r.squared value.

first, I read in my data from a csv

df <- as.data.frame(read.csv("BCO.csv", header = TRUE, stringsAsFactors = FALSE))
df <- df[,2:4]

then I began looking at the trends...

CLDD_trend <- ddply(df, .(STATION_NAME), function(z)coef(lm(CLDD_yr ~ year, data = z)))

here is where I am running into a problem...

CLDD_rsq <- ddply(df, .(STATION_NAME), summarise, rsq = summary(lm(CLDD_yr ~ year))$r.squared)

I am getting this error...

Error: invalid term in model formula

here is the head of df

> head(df)
                    STATION_NAME year CLDD_yr
1 ALBUQUERQUE FOOTHILLS NE NM US 1992    3341
2 ALBUQUERQUE FOOTHILLS NE NM US 1993    4443
3 ALBUQUERQUE FOOTHILLS NE NM US 1994    5319
4 ALBUQUERQUE FOOTHILLS NE NM US 1995    5070
5 ALBUQUERQUE FOOTHILLS NE NM US 1996    5338
6 ALBUQUERQUE FOOTHILLS NE NM US 1997    5105

and the head of CLDD_trend

> head(CLDD_trend)
                             STATION_NAME (Intercept)      year
1          ALBUQUERQUE FOOTHILLS NE NM US -185183.485 95.159091
2 ALBUQUERQUE INTERNATIONAL AIRPORT NM US -138428.871 73.121774
3                   ALBUQUERQUE VLY NM US -138218.809 72.243478
4           PETROGLYPH NATIONAL MON NM US  -95959.130 51.074086
5                       SANDIA PARK NM US    7758.845 -3.439124

my aim is to append a new column to CLDD_trend that contains the results of the r.squared portion of the summarise function.

stat <- cbind(CLDD_trend[,1&3], CLDD_rsq$rsq)

Can you see where the error in my model is? I am stumped.

Upvotes: 1

Views: 80

Answers (2)

Richard Telford
Richard Telford

Reputation: 9923

This is how to extract the two statistics with a single run

CLDD_trend <- ddply(df, .(STATION_NAME), function(z){
  mod <- lm(CLDD_yr ~ year, data = z)
  c(coef(mod), rsq = summary(mod)$r.squared
})

Upvotes: 1

c0ba1t
c0ba1t

Reputation: 241

Turns out I had some sort of conflict with dplyr and plyr.

I solved the issue by

detach()
library(plyr)
df <- as.data.frame(read.csv("BCO.csv", header = TRUE, stringsAsFactors = FALSE))
df <- df[,2:4]
CLDD_trend <- ddply(df, .(STATION_NAME), function(z)coef(lm(CLDD_yr ~ year, data = z)))
CLDD_rsq <- ddply(df, .(STATION_NAME), summarise, rsq = summary(lm(CLDD_yr ~ year))$r.squared)
stat <- cbind(CLDD_trend[,1&3], CLDD_rsq$rsq)

Upvotes: 1

Related Questions