Delbudge
Delbudge

Reputation: 59

R Linear Regression - Trouble setting values

I am new to R and I need help getting some values from my data set. The information is dollar amounts per each year for a list of cities. I'm trying to setup my values so that I can run a linear regression model on the entire dataset names estimates.

estimate <- read.csv("estimate.csv", check.names = FALSE) #Import
estimate

location  2010  2011  2012  2013  2014
city1     200   250   300   500   600
city2     300   300   400   650   780
city3     500   600   700   800   900

I am only interested in the data for city3 for the years show.

I know I can just use the code years <- c(2010,2011,2012,2013,2014) to create my years variable, but I know that is only practical for small tables.

For my linear model I would like to first plot(years, values) where the years are columns 2:6 and the values that correspond are from row 3 only. When I run values <- estimate[3, c(3,2:6] I get the data for the values but when I try to do the same thing for years <- estimate[0, c(0,2:6)] I get a 0 object of 5 variables. Trying to plot that gives me

Error in plot.window(...) : need finite 'xlim' values In addition: Warning messages: 1: In min(x) : no non-missing arguments to min; returning Inf 2: In max(x) : no non-missing arguments to max; returning -Inf 3: In min(x) : no non-missing arguments to min; returning Inf 4: In max(x) : no non-missing arguments to max; returning -In

Ideally I would like the data setup where:

years        values
2010         500
2011         600
2012         700
2013         800
2014         900

And I can then run an lm function. Thanks ahead of time. I'm real new at this stuff in R and on Stack so please forgive my newbishness.

Upvotes: 2

Views: 80

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270055

1) extraction Assuming the data shown reproducibly in the Note at the end we can perform the regression like this:

year <- as.numeric(names(estimate)[-1])
city3 <- unlist((estimate[3, -1]))
lm(city3 ~ year)

2) melt or we can convert estimate to long form, here 15x3, and then fix up names and make Year numeric and then perform the regression:

library(reshape2)

long <- melt(estimate, id = "Location")
names(long) <- c("Location", "Year", "Estimate")
long$Year <- as.numeric(as.character(long$Year))

lm(Estimate ~ Year, long, subset = Location == "city3")

2a) reshape Converting from wide to long form could also be done without any packages like this:

yrs <- names(estimate)[-1]
long <- reshape(estimate, dir = "long", idvar = "Location", 
  varying = list(yrs), times = as.numeric(yrs), timevar = "Year", v.names = "Estimate")

lm(Estimate ~ Year, long, subset = Location == "city3")

Note:

Lines <- "
Location,2010,2011,2012,2013,2014
city1,200,250,300,500,600
city2,300,300,400,650,780
city3,500,600,700,800,900"
estimate <- read.csv(text = Lines, check.names = FALSE)

Upvotes: 1

lebelinoz
lebelinoz

Reputation: 5068

When you read csv files with read.csv, the first row becomes the names in your data frame. Try

names = colnames(estimate)

You'll see that names is a character vector c("location", "2010", "2011", ...). You can translate this to years by dropping the first item and converting to numeric:

years = as.numeric(names[-1])

Upvotes: 0

Related Questions