Reputation: 1
user_a - 3
user_b - 4
user_c - 1
user_d - 4
I want to show the distribution over number of tweets per author in r using a histogram. The original file has 1048575 such rows
I did hist(df$twitter_count, nrow(df))
but I don't think its correct
Upvotes: 0
Views: 6268
Reputation: 783
Since you said, distribution for 'each user', I think it should be a bar blot:
require(data.table)
dat <- fread("
user_a - 3
user_b - 4
user_c - 1
user_d - 4"
)
barplot( names.arg = dat$V1, as.numeric(dat$V3) )
or if you are looking for histograms, then:
hist(as.numeric(dat$V3), xlab = "", main="Histogram")
Upvotes: 0
Reputation: 6222
It seems I have misunderstood the question. I think following could be what the OP is looking for.
df <- data.frame(user = letters,
twitter_count = sample.int(200, 26))
ggplot(df, aes(user, twitter_count)) +
geom_col()
Assuming you are looking for multiple histograms.
Replace user
with respective variable name in your data.frame.
# Example data
df <- data.frame(user = iris$Species,
twitter_count= round(iris[, 1]*10))
# Histograms using ggplot2 package
library(ggplot2)
ggplot(df, aes(x = twitter_count)) +
geom_histogram() + facet_grid(.~user)
Best to use an alternative method to see the distributions of twitter counts if your data contain many twitter users.
Upvotes: 3
Reputation: 305
If each row of the data.frame represents a user -
set.seed(1)
df <- data.frame(user = letters, twitter_count = rpois(26, lambda = 4) + 1)
hist(df$twitter_count)
Upvotes: 1