Reputation: 11
I'm a new in R-programming for data analysis.
I trying to create my project with dataset name "all_trip_v2" from public datasets
I aim to create a barchart to show only top 10 of Total count of each "start_station_name" and show in a bar chart with ggplot2 + geom_bar() and show the proportion of member type(member_casual)
I run this code
ggplot(all_trips_v2, aes(start_station_name,
fill = member_casual)) +
geom_bar()
As you can see, The result have a lots of bar grouped by "start_station_name". I just need to filter only top 10 count of start station name. Please give me some advice. Thank you so much.
I expected to create a bat like this
Upvotes: 0
Views: 131
Reputation: 13863
I don't know of a good way to directly do this in "one step", but it should be easier to follow done in two steps anyway. Step 1 = summarize your dataset by count, and Step 2 = filter dataset to include first X rows.
Here's an example with the chickwts
built-in dataset
library(ggplot2)
df <- chickwts
ggplot(df, aes(feed)) + geom_bar() +
theme_classic()
To only draw the top 3 bars, you could do the two-step process:
library(dplyr)
library(tidyr)
# STEP 1: summarize by feed count & arrange
df_counts <- df %>%
count(feed) %>% # creates column n with counts for feed
arrange(-n) # arrange descending by n
# STEP 2: plot with a filtered dataset
ggplot(df %>% dplyr::filter(feed %in% df_counts$feed[1:3]),
aes(feed)) +
geom_bar() + theme_classic()
For OP's case, maybe the following would work?
# STEP 1
all_summary <- all_trips_v2 %>%
count(start_station_name) %>% arrange(-n)
# STEP 2
ggplot(
all_trips_v2 %>%
dplyr::filter(start_station_name %in% all_summary$start_station_name[1:10]),
aes(start_station_name, fill = member_casual)) +
geom_bar()
Upvotes: 0