Ken Reid
Ken Reid

Reputation: 609

Create boxplots from separate data frames with different number of rows

I'm trying to make a data frame to then create a boxplot from. The data frame should contain 3 vectors of varying sizes. Let's say the data is currently in a$data, b$data and c$data, and are of lengths 7, 50, 200.

Here is a simplified version of my code, where the cbind step errors:

# create initial df
    df <- data.frame()
# set column names
    colnames(df) <- c("a", "b", "c")
# bind original data to new data frame: 
    df <- cbind(df, a$data, b$data, c$data)
# draw boxplot
    boxplot(df)

Upvotes: 1

Views: 878

Answers (3)

Henrik
Henrik

Reputation: 67778

A data.frame can not "contain [...] vectors of varying sizes". But lists can, e.g.

l = list(x = rnorm(5, 2), y = rnorm(10, 3), z = rnorm(20, 1)).

And boxplot happily eats lists:

boxplot(l) enter image description here

Upvotes: 3

r2evans
r2evans

Reputation: 160417

I tend to think of groups of boxplots in a "long-data" sense.

Fake-data:

set.seed(2021)
df1 <- data.frame(x=runif(10)); df2 <- data.frame(x=runif(20)); df3 <- data.frame(x=runif(100))
head(df1,3); head(df2,3); head(df3,3)
#           x
# 1 0.4512674
# 2 0.7837798
# 3 0.7096822
#            x
# 1 0.02726706
# 2 0.83749040
# 3 0.60324073
#            x
# 1 0.03277595
# 2 0.94270937
# 3 0.94773844

Combine into one long frame:

# tidyverse
dfall <- dplyr::bind_rows(dplyr::lst(df1, df2, df3), .id = "id")

# data.table
dfall <- rbindlist(list(df1=df1, df2=df2, df3=df3), idcol = "id")

# base R
lst_of_frames <- list(df1=df1, df2=df2, df3=df3)
lst_of_frames <- Map(function(x,nm) transform(x, { id = nm }), lst_of_frames, names(lst_of_frames))
dfall <- do.call(rbind, lst_of_frames)

Plot:

boxplot(x ~ id, data = dfall)

simple boxploot

Upvotes: 2

Rui Barradas
Rui Barradas

Reputation: 76402

Try the following code and see if it solves the problem.
It creates a data set in the long format, with a column vector of variables "a", "b" and "c", and their respective values. Then plots the data with the formula interface.

variable <- rep(c("a", "b", "c"), 
                c(length(a$data),length(b$data), length(c$data)))
value <- c(a$data, b$data, c$data)

df <- data.frame(variable, value)
boxplot(value ~ variable, data = df)

Upvotes: 2

Related Questions