Reputation: 31
I have a question about R programming.
If I have a dataset like the following:
LA NY MA
1 2 3
4 5 6
3 5
4
(In other words, not all rows are the same.)
I am trying to use lm
to perform an ANOVA test (to decide whether the mean number is the same in each state), and it keeps showing "an error occurred" because rows do not match. How can I fix this issue?
Also, when I do lm
, I usually do lm(y~x)
, so if I want to do lm(y~LA)
, then there's no y variable to type in. Should I create a new column/row for this?
Upvotes: 0
Views: 245
Reputation: 738
You can use gather() from tidyr package to shape data into long format for the purpose of analysis. It takes multiple columns, and gathers them into key-value pairs: it makes “wide” data longer.
Sample code:
LA <- c(1,4,3,4)
NY <- c(4,5,6, NA)
MA <- c(3,6, NA, NA)
df <- data.frame(LA, NY, MA) # data in wide format
library(tidyr)
df <- df %>% gather(attribute, value) # data in long format
Upvotes: 0
Reputation: 99371
Maybe you could do something like this. To read the data, use the fill
argument in read.table
. Where text = txt
, you would put your file name there.
(dat <- read.table(text = txt, header = TRUE, fill = TRUE))
# LA NY MA
# 1 1 2 3
# 2 4 5 6
# 3 3 5 NA
# 4 4 NA NA
Then we can take the column means and create a new two column data frame.
cm <- colMeans(dat, na.rm = TRUE)
data.frame(state = names(cm), mean = unname(cm))
# state mean
# 1 LA 3.0
# 2 NY 4.0
# 3 MA 4.5
where txt
is
txt <- "LA NY MA
1 2 3
4 5 6
3 5
4"
Upvotes: 1