martinsarif
martinsarif

Reputation: 105

Constructing a Boxplot from a dataframe consisting of Multivalue columns

Suppose that we have a dataframe in which one of the columns represents a list of numerical data entries.

"ID","Costs"
"tim","1, 2, 3, 4, 5, 6, 7, 8"
"ryan","8, 7, 6, 5, 4, 3, 2, 1"
"bob","1, 3, 5, 7, 9, 11, 13, 15"

If I wanted to construct a box-plot of costs with respect to ID, how would approach doing so?

Upvotes: 0

Views: 67

Answers (3)

thelatemail
thelatemail

Reputation: 93813

A base R solution is pretty much a one-liner, since boxplot() will accept a list as input:

boxplot(lapply(strsplit(dat$Costs, ",\\s+"), as.numeric), names=dat$ID)

enter image description here

dat in this case being:

dat <- structure(list(ID = c("tim", "ryan", "bob"), Costs = c("1, 2, 3, 4, 5, 6, 7, 8", 
"8, 7, 6, 5, 4, 3, 2, 1", "1, 3, 5, 7, 9, 11, 13, 15")), .Names = c("ID", 
"Costs"), class = "data.frame", row.names = c(NA, -3L))

Upvotes: 2

IRTFM
IRTFM

Reputation: 263352

enter image description here

If you want a base solution, here's one possibility:

boxplot( values~ind, 
       data=stack( data.frame( apply(df1, 1, # stack function converts wide to long
                function(r) setNames( 
                                list(scan(text=r[2], sep=",")), # numeric Costs
                                r[1]) ) )) )  # names then as 'ID'

Upvotes: 1

neilfws
neilfws

Reputation: 33782

Assuming that the data are as given in your example, i.e. column Costs contains quoted characters separated by comma + space:

df1 <- read.csv(text = '"ID","Costs"
"tim","1, 2, 3, 4, 5, 6, 7, 8"
"ryan","8, 7, 6, 5, 4, 3, 2, 1"
"bob","1, 3, 5, 7, 9, 11, 13, 15"', 
header = TRUE, 
stringsAsFactors = FALSE)

Then you can separate the values using unnest, convert to numeric and plot:

library(tidyverse)
df1 %>% 
  unnest(Costs = str_split(Costs, ", ")) %>% 
  mutate(Costs = as.numeric(Costs)) %>% 
  ggplot(aes(ID, Costs)) + 
    geom_boxplot()

enter image description here

Upvotes: 2

Related Questions