Reputation: 309
I'm trying to analyze a data set in R where I have sales of items over time and I want to understand the impact of categorical variables on the quantity sold.
library("data.table")
qty <- c(100,10000,100,200,150,9000)
flavour <- c("Mint","Herb","Mint","Mint","Herb","Fruit")
category <- c("Multiple","Multiple","White","Multiple","Other","White")
sales_data <- data.frame(qty,flavour,category)
str(sales_data)
'data.frame': 6 obs. of 3 variables:
$ qty : num 100 10000 100 200 150 9000
$ flavour : Factor w/ 3 levels "Fruit","Herb",..: 3 2 3 3 2 1
$ category: Factor w/ 3 levels "Multiple","Other",..: 1 1 3 1 2 3
I've been looking at multipleregressions and simple linear regressions, but I feel I might be on the wrong track. My understanding is that I can use a simple linear regression to determine a relationship between 2 continuous variables. I can see there is a way to use multiple regressions to understand the relationship between categorical variables and continuous ones but the examples I've found seem to stop at binary values. Does someone smoke or not for example. Given I have multiple values for each categorical variable, is multiple regression the right way to go or have I completely gone off track?
My actual data set has around 10 categorical variables, some of which relate to location, others which relate to brands.
Any help would be greatly appreciated. And apologies if this is in the wrong place or I've missed something obvious - I'm learning stats and R at the same time so becoming confused quickly
Upvotes: 2
Views: 2149
Reputation: 269586
You can certainly have a continuous dependent variable (qty
) and a mix of continuous and categorical predictors and they don't have to be binary. The categorical variables should be of class "factor"
. For the two categorical/factor variables shown in the question:
fm <- lm(qty ~., sales_data)
summary(fm)
Upvotes: 2