Apoorva Hungund
Apoorva Hungund

Reputation: 35

Need to plot line plot across one factor variable with two levels

I have a dataset that records details in milliseconds with a maximum of 20 seconds. I need to plot a value "Buffer" across a one-factor variable with two levels - A and B. I'm trying to plot a geom_line() with x as the time, y as the Buffer and two lines for A and B. My problem is that it plots a line for every observation and does not aggregate it by factor. Here is the code I was using:

ggplot(DT, aes(x = Real_Time_Stamp, y = Buffer)) + geom_line(aes(color = FVN))

And here is the plot it is generating: enter image description here

The dataset I'm dealing with has 49,999 rows and 3 columns. Here is an example:

structure(list(FVN = c("A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A", "A"), Real_Time_Stamp = c(0.015233039855957, 0.0325429439544678, 0.0483760833740234, 0.0653512477874756, 0.0819132328033447, 0.0988430976867676, 0.11584997177124, 0.132710218429565, 0.148336172103882, 0.165808200836182, 0.182291269302368, 0.199646949768066, 0.215576171875, 0.233185052871704, 0.248784303665161, 0.266969203948975, 0.282114028930664, 0.299488067626953, 0.315442323684692, 0.332358121871948), Buffer = c(1.984, 1.968, 1.952, 1.936, 1.936, 1.952, 1.936, 1.92, 1.904, 1.888, 1.872, 1.856, 1.856, 1.872, 1.856, 1.856, 1.84, 1.84, 1.824, 1.824)), row.names = c(NA, 20L), class = "data.frame")

I've used the above code before to generate line plots based on factor levels. What am I doing wrong?

Upvotes: 0

Views: 46

Answers (2)

Apoorva Hungund
Apoorva Hungund

Reputation: 35

Based on @stefan's suggestion, I binned the real-time stamp variable. As they mentioned, there is a lot of variation in the Buffer variable and real-time stamp.

Here is the code I used:

ggplot(DT, aes(x = Real_Time_Stamp, y = Buffer)) + 
  geom_line(aes(color = FVN, group = FVN), stat = "summary") + 
  scale_x_binned(name = "\nTime (s)",n.breaks = 100, limits = c(0,20), breaks = seq(0,20, by = 1))+
  scale_y_continuous(name = "\nBuffer Values", limits = c(-0.5,2.5),breaks = seq(0,2, by = 1)))

And here is the plot I got:

enter image description here

Upvotes: 0

pachadotdev
pachadotdev

Reputation: 3775

I don't totally understand the question, but here is my solution.

I start by assuming that group A has mean 1.25 and group B has mean 1.75. Why? Because your buffer values are 1.x.

Then I create fake data with 50k obs and aggregate to plot one line per group, which I think is what you missed and get a curious "art"

library(ggplot2)
library(dplyr)

# fake data
n <- 25000
set.seed(42)
DT <- data.frame(
  FVN = rep(c("A", "B"), each = n),
  Real_Time_Stamp = rep(seq(0, 20, length.out = n), 2),
  Buffer = c(rnorm(n, mean = 1.25, sd = 0.05), rnorm(n, mean = 1.75, sd = 0.05))
)

# average buffer per time point per fvn level
DT_summary <- DT %>%
  group_by(FVN, Real_Time_Stamp) %>%
  summarise(Buffer = mean(Buffer), .groups = 'drop')

ggplot(DT_summary, aes(x = Real_Time_Stamp, y = Buffer, color = FVN)) +
  geom_line() +
  labs(title = "Buffer Over Time by FVN Level", x = "Real Time Stamp (seconds)", y = "Buffer") +
  theme_minimal()

result:

enter image description here

Upvotes: 0

Related Questions