Gero
Gero

Reputation: 157

Plot classified by categories with column-names (R)

I've got a dataframe that possess the next structure:

D1A1    D1A2    D1A3    D1B1    D1B2    D1B3    D2A1    D2A2    D2A3    D2B1    D2B2    D2B3
 10      12      15      40      39      27      11      13      14      33      31      32

The actual dataframe has a greater dimension (40 observations / columns). My interest is to create any kind of possible plot showing all the numerical information together with the data clustered by their column classification (D1A, D1B, D2A, D2B) as follows:

               D1A1+D1A2+D1A3 || D1B1+D1B2+D1B3 || D2A1+D2A2+D2A3 || D2B1+D2B2+D2B3

As long as I feel extremely lost, any suggestion would be appreciated.

Upvotes: 1

Views: 560

Answers (1)

akrun
akrun

Reputation: 887038

We can split the dataset by the substring of column names, loop over the list and get the rowSums and use barplot

out <- sapply(split.default(df1, sub("\\d+$", "", names(df1))), 
             rowSums, na.rm = TRUE)
barplot(out)

If there are more rows and want to plot, use tidyverse, we can reshape into 'long' format with pivot_longer by making use of the pattern in column names i.e. capturing the substring of column names without the digits at the end. This create 4 columns. Then, we use summarise with across to get the sum of each columns and return a bar plot - geom_col

library(dplyr)
library(tidyr)
library(ggplot2)
df2 %>% 
    pivot_longer(cols = everything(), names_to = ".value", 
        names_pattern = "(.*)\\d+$") %>% 
    summarise(across(everything(), sum, na.rm = TRUE)) %>% 
    pivot_longer(cols = everything()) %>% 
   ggplot(aes(x = name, y = value, fill = name)) + 
      geom_col()

-output

enter image description here


If we are interested in the spread of the data, a boxplot can help. Here, we don't summarise, and instead of geom_col use geom_boxplot

 df2 %>%
     pivot_longer(cols = everything(), names_to = ".value", 
          names_pattern = "(.*)\\d+$") %>%
     pivot_longer(cols = everything()) %>%
     ggplot(aes(x = name, y = value, fill = name)) + 
        geom_boxplot()

data

df1 <- structure(list(D1A1 = 10L, D1A2 = 12L, D1A3 = 15L, D1B1 = 40L, 
    D1B2 = 39L, D1B3 = 27L, D2A1 = 11L, D2A2 = 13L, D2A3 = 14L, 
    D2B1 = 33L, D2B2 = 31L, D2B3 = 32L), class = "data.frame", row.names = c(NA, 
-1L))

df2 <- structure(list(D1A1 = c(10L, 15L), D1A2 = c(12L, 23L), D1A3 = 15:14, 
    D1B1 = c(40L, 23L), D1B2 = c(39L, 14L), D1B3 = c(27L, 22L
    ), D2A1 = 11:10, D2A2 = c(13L, 15L), D2A3 = c(14L, 17L), 
    D2B1 = c(33L, 35L), D2B2 = c(31L, 35L), D2B3 = c(32L, 32L
    )), class = "data.frame", row.names = c(NA, -2L))

Upvotes: 1

Related Questions