Reputation: 1
Long time lurker, first time poster.
I'm in an introductory R course and I'm trying to create histograms and summaries for the age of diagnosis with diabetes "diabage2" and their insulin use "insulin" (Yes/No/NA). The dataset is brfss2013.
My first attempt was brfss2013 %>% group_by(insulin = "Yes") %>% summarise(MEAN = mean(brfss2013$diabage2, na.rm = TRUE), n = n())
insulin MEAN n
<chr> <dbl> <int>
1 Yes 51.48694 491775
Which looks fine, except I know that MEAN and n are reported for the sample mean and n, not the selected part of the sample (I've had this problem in another part of my project - not sure why it's not working. I can verify that the answer is incorrect.)
When I tried to use subset() and select for only data that met my conditions so I could easily summarise it and make histograms (i.e. one group of data where insulin = yes and one for insulin = no)
wInsulin <- subset(brfss2013, insulin = "Yes", select = c(diabage2))
woInsulin <- subset(brfss2013, insulin = "No", select = c(diabage2))
These looked the same, even though they shouldn't contain any of the same observations since they're mutually exclusive.
When I tried to use select() to trim down the set I'm using from 330 variables to three, I encountered another problem:
InsulinData <- select(brfss2013$insulin, brfss2013$diabage, brfss2013$sex, brfss2013$X_state)
gave me the error
Error in UseMethod("select_") :
no applicable method for 'select_' applied to an object of class "factor"
Which I have no idea what to make of.
I feel like I'm missing something very fundamental, but my lack of experience means that I don't have the foundations to understand a lot of solutions to other people's problems and the course thus far has covered more statistical theory than the actual details of dealing with R. I would really appreciate any guidance I could get.
Upvotes: 0
Views: 9993
Reputation: 1
I had this error once, turns out I had unknowingly converted my data.frame into a factor. Check under the global environment under type to see how your data.frame is saved as.
Upvotes: -1
Reputation: 44638
You almost had this:
InsulinData <- select(brfss2013$insulin,
brfss2013$diabage,
brfss2013$sex,
brfss2013$X_state)
Should be:
InsulinData <- select(brfss2013, insulin, diabage, sex, X_state)
With dplyr
you only need to specify the data.frame once. select
thought you were trying to select columns from the variable brfss2013$insulin
, which you can't.
Also, your first set of intstructions are a bit confusing:
group_by(insulin = "yes")
You group_by(insulin)
and you filter rows by filter(insulin == "yes")
Probably want something more like:
brfss2013 %>%
group_by(insulin) %>%
summarise(MEAN = mean(diabage2, na.rm = TRUE), n = n())
Upvotes: 1