Reputation: 11
I am learning R as a beginner and am trying to generate a plot today by using the following code:
> dailyActivity_merged_2 %>%
+ group_by(ActivityDate) %>%
+ select(Actlevl == "High") %>%
+ summarise(average_distance = mean(TotalDistance)) %>%
+ ggplot() + geom_col(mapping= aes(x=ActivityDate, y=average_distance, fill = average_distance)) + scale_fill_gradient(low = "yellow", high = "red") +
+ theme(axis.text.x = element_text(angle = 90)) +
+ labs(title="Average Distance vs. Time")
The outcome returned with the following message, but I am very sure the column I would like to choose in the dataset is named "Actlevl". I am not sure why it keeps saying object not found. Error in select(Actlevl == "High") : object 'Actlevl' not found
Did I do something wrong? Maybe I should not use select() to choose the data value? I am trying to select the rows with "High" in column Actlevl.
Thank you so much for your help.
Dataset image is like below:
Sebset data example:
> dput(dailyActivity_merged_2[1:35,c(1:5)])
structure(list(Id = c(1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1503960366, 1503960366, 1503960366,
1503960366, 1503960366, 1503960366, 1624580081, 1624580081, 1624580081,
1624580081), Actlevl = c("High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "High", "High", "High", "High", "High", "High",
"High", "High", "Low", "Low", "Low", "Low"), ActivityDate = c("4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016", "4/16/2016", "4/17/2016",
"4/18/2016", "4/19/2016", "4/20/2016", "4/21/2016", "4/22/2016",
"4/23/2016", "4/24/2016", "4/25/2016", "4/26/2016", "4/27/2016",
"4/28/2016", "4/29/2016", "4/30/2016", "5/1/2016", "5/2/2016",
"5/3/2016", "5/4/2016", "5/5/2016", "5/6/2016", "5/7/2016", "5/8/2016",
"5/9/2016", "5/10/2016", "5/11/2016", "5/12/2016", "4/12/2016",
"4/13/2016", "4/14/2016", "4/15/2016"), TotalSteps = c(13162,
10735, 10460, 9762, 12669, 9705, 13019, 15506, 10544, 9819, 12764,
14371, 10039, 15355, 13755, 18134, 13154, 11181, 14673, 10602,
14727, 15103, 11100, 14070, 12159, 11992, 10060, 12022, 12207,
12770, 0, 8163, 7007, 9107, 1510), TotalDistance = c(8.5, 6.96999979,
6.739999771, 6.28000021, 8.159999847, 6.480000019, 8.590000153,
9.880000114, 6.679999828, 6.340000153, 8.130000114, 9.039999962,
6.409999847, 9.800000191, 8.789999962, 12.21000004, 8.529999733,
7.150000095, 9.25, 6.809999943, 9.710000038, 9.659999847, 7.150000095,
8.899999619, 8.029999733, 7.710000038, 6.579999924, 7.71999979,
7.769999981, 8.130000114, 0, 5.309999943, 4.550000191, 5.920000076,
0.9800000191)), row.names = c(NA, -35L), class = c("tbl_df",
"tbl", "data.frame"))
I tried to write the ggplot2 code as above but it keeps running error.
Upvotes: 1
Views: 513
Reputation: 2462
there are two issues that I can spot:
You're using select
instead of filter
. Select is to pick a column, filter
to pick rows that match a certain requirement.
When you use summarise
, you lose all previous columns that are not listed in group_by
.
This is my attempt at fixing the issue. It works but it's a bit verbose, using right_join
and filter
ing again in order to recover the lost columns. Can anyone make this better?
library(ggplot2)
dailyActivity_merged_2 %>%
group_by(ActivityDate) %>%
filter(Actlevl == "High") %>%
summarise(average_distance = mean(TotalDistance)) %>%
right_join(dailyActivity_merged_2) %>%
filter(Actlevl == "High") %>%
ggplot() +
geom_col(mapping = aes(x = ActivityDate, y = average_distance, fill = average_distance)) +
scale_fill_gradient(low = "yellow", high = "red") +
theme(axis.text.x = element_text(angle = 90)) +
labs(title = "Average Distance vs. Time")
Upvotes: 1