Vita D.
Vita D.

Reputation: 11

ggplot2 code error: object not found in the dataset

I am learning R as a beginner and am trying to generate a plot today by using the following code:

> dailyActivity_merged_2 %>%
+     group_by(ActivityDate) %>%
+     select(Actlevl == "High") %>%
+     summarise(average_distance = mean(TotalDistance)) %>%
+     ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+     theme(axis.text.x = element_text(angle = 90)) +
+     labs(title="Average Distance vs. Time")

The outcome returned with the following message, but I am very sure the column I would like to choose in the dataset is named "Actlevl". I am not sure why it keeps saying object not found. Error in select(Actlevl == "High") : object 'Actlevl' not found

Did I do something wrong? Maybe I should not use select() to choose the data value? I am trying to select the rows with "High" in column Actlevl.

Thank you so much for your help.

Dataset image is like below: enter image description here

Sebset data example:

> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081, 
1624580081), Actlevl = c("High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "High", "High", "High", "High", "High", "High", 
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016", 
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016", 
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016", 
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016", 
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016", 
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016", 
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162, 
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764, 
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602, 
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207, 
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979, 
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153, 
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962, 
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733, 
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095, 
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979, 
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076, 
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df", 
"tbl", "data.frame"))
I tried to write the ggplot2 code as above but it keeps running error.

Upvotes: 1

Views: 513

Answers (1)

Andrea M
Andrea M

Reputation: 2462

there are two issues that I can spot:

  1. You're using select instead of filter. Select is to pick a column, filter to pick rows that match a certain requirement.

  2. When you use summarise, you lose all previous columns that are not listed in group_by.

This is my attempt at fixing the issue. It works but it's a bit verbose, using right_join and filtering again in order to recover the lost columns. Can anyone make this better?

library(ggplot2)

dailyActivity_merged_2 %>%
  group_by(ActivityDate) %>%
  filter(Actlevl == "High") %>%
  summarise(average_distance = mean(TotalDistance)) %>%
  right_join(dailyActivity_merged_2) %>% 
  filter(Actlevl == "High") %>%
  ggplot() +
  geom_col(mapping = aes(x = ActivityDate, y = average_distance, fill = average_distance)) +
  scale_fill_gradient(low = "yellow", high = "red") +
  theme(axis.text.x = element_text(angle = 90)) +
  labs(title = "Average Distance vs. Time")

Output: output

Upvotes: 1

Related Questions