Reputation: 157
I've got a dataframe that possess the next structure:
D1A1 D1A2 D1A3 D1B1 D1B2 D1B3 D2A1 D2A2 D2A3 D2B1 D2B2 D2B3
10 12 15 40 39 27 11 13 14 33 31 32
The actual dataframe has a greater dimension (40 observations / columns). My interest is to create any kind of possible plot showing all the numerical information together with the data clustered by their column classification (D1A, D1B, D2A, D2B) as follows:
D1A1+D1A2+D1A3 || D1B1+D1B2+D1B3 || D2A1+D2A2+D2A3 || D2B1+D2B2+D2B3
As long as I feel extremely lost, any suggestion would be appreciated.
Upvotes: 1
Views: 560
Reputation: 887038
We can split the dataset by the substring of column names, loop over the list
and get the rowSums
and use barplot
out <- sapply(split.default(df1, sub("\\d+$", "", names(df1))),
rowSums, na.rm = TRUE)
barplot(out)
If there are more rows and want to plot, use tidyverse
, we can reshape into 'long' format with pivot_longer
by making use of the pattern in column names i.e. capturing the substring of column names without the digits at the end. This create 4 columns. Then, we use summarise
with across
to get the sum
of each columns and return a bar plot - geom_col
library(dplyr)
library(tidyr)
library(ggplot2)
df2 %>%
pivot_longer(cols = everything(), names_to = ".value",
names_pattern = "(.*)\\d+$") %>%
summarise(across(everything(), sum, na.rm = TRUE)) %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x = name, y = value, fill = name)) +
geom_col()
-output
If we are interested in the spread of the data, a boxplot can help. Here, we don't summarise
, and instead of geom_col
use geom_boxplot
df2 %>%
pivot_longer(cols = everything(), names_to = ".value",
names_pattern = "(.*)\\d+$") %>%
pivot_longer(cols = everything()) %>%
ggplot(aes(x = name, y = value, fill = name)) +
geom_boxplot()
df1 <- structure(list(D1A1 = 10L, D1A2 = 12L, D1A3 = 15L, D1B1 = 40L,
D1B2 = 39L, D1B3 = 27L, D2A1 = 11L, D2A2 = 13L, D2A3 = 14L,
D2B1 = 33L, D2B2 = 31L, D2B3 = 32L), class = "data.frame", row.names = c(NA,
-1L))
df2 <- structure(list(D1A1 = c(10L, 15L), D1A2 = c(12L, 23L), D1A3 = 15:14,
D1B1 = c(40L, 23L), D1B2 = c(39L, 14L), D1B3 = c(27L, 22L
), D2A1 = 11:10, D2A2 = c(13L, 15L), D2A3 = c(14L, 17L),
D2B1 = c(33L, 35L), D2B2 = c(31L, 35L), D2B3 = c(32L, 32L
)), class = "data.frame", row.names = c(NA, -2L))
Upvotes: 1