Syed Ahmed
Syed Ahmed

Reputation: 209

ECDF plot for all columns in the dataframe in R

I need to plot ECDF's of all columns of the dataframe in one single plot and get an x_limit on the axis too.

The function that I wrote:

library(lattice)
library(latticeExtra)

  ecdf_plot <-  function(data){
  
     # Drop columns with only NA's
     data <- data[, colSums(is.na(data)) != nrow(data)]
     data$P_key <- NULL
  
     ecdfplot(~ S12, data=data, auto.key=list(space='right'))
  }

Problem: The ECDF in the above function only plots for the column S12 but I need this for all columns in the dataframe. I know i can do S12 + S13 + ... but the source data changes and we don't exactly know how many and which columns will the dataframe get. Is there a better way out of this? Also, is it possible to get the x_limit for the combined plot to be just one range like xlim(0,100)?

Upvotes: 0

Views: 324

Answers (1)

Allan Cameron
Allan Cameron

Reputation: 174506

I think this task would be easier using ggplot. It would be very easy to set the limits as required, customise the appearance, etc.

The function would look like this:

library(dplyr)
library(tidyr)
library(ggplot2)

ecdf_plot <- function(data) {

  data[, colSums(is.na(data)) != nrow(data)] %>%
    pivot_longer(everything()) %>%
    group_by(name) %>%
    arrange(value, by_group = TRUE) %>%
    mutate(ecdf = seq(1/n(), 1 - 1/n(), length.out = n())) %>%
    ggplot(aes(x = value, y = ecdf, colour = name)) +
    xlim(0, 100) +
    geom_step() +
    theme_bw()
}

Now let's test it on a random data frame:

set.seed(69)

df <- data.frame(unif = runif(100, 0, 100), norm = rnorm(100, 50, 25))

ecdf_plot(df)

Upvotes: 1

Related Questions