Reputation: 11
I'm attempting to plot a stacked barplot with ggplot2 with this code
barplot <- ggplot() + geom_bar(aes(y = percentage, x = TBD, fill = TBD), data = charts.data, stat="identity")
I want to create a barplot for my single cell analysis that has 2 conditions, a treated and an untreated condition. I want to show with the barplot, the percentage of different cell types per condition to see whether the treated with having an effect on the different cell types.
How do I go about determining the percent of each cell type in each condition and then go about plotting the barplot?
output of dput(head(comparison))
structure(c(6051L, 1892L, 1133L, 893L, 148L, 868L, 5331L, 3757L,
1802L, 1061L, 2786L, 704L), .Dim = c(6L, 2L), .Dimnames = structure(list(c("Fibroblast", "T cell", "Macrophage", "Stellate", "Acinar", "Endothelial"), c("treated", "untreated")), .Names = c("",
"")), class = "table")
output of dput(head(cell_cycle_data))
structure(list(orig.ident = c("treated", "treated", "treated",
"treated", "treated", "treated"), nCount_RNA = c(1892, 307, 1348,
3699, 4205, 4468), nFeature_RNA = c(960L, 243L, 765L, 1612L,
1341L, 1644L), percent.mt = c(0.211416490486258, 1.62866449511401,
4.45103857566766, 4.4065963773993, 0.0713436385255648, 3.87197851387645
), RNA_snn_res.0.5 = structure(c(11L, 11L, 5L, 6L, 11L, 13L), .Label = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19"), class = "factor"), seurat_clusters = structure(c(11L, 11L, 5L, 6L, 11L, 13L), .Label = c("0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18", "19"), class = "factor"), S.Score = c(0.476893835992198, -0.0200784617568548, -0.0335915198305002, -0.0247184276246385, 0.010785196602457, 0.0190008903712199), G2M.Score = c(0.204441469200986, 0.173804859670862, -0.0313235510969097, -0.0376796363661889, -0.0559526905696905, -0.0122031631356698), Phase = structure(c(3L, 2L, 1L, 1L, 3L, 3L), .Label = c("G1", "G2M", "S"), class = "factor"), old.ident = structure(c(7L,7L, 1L, 4L, 7L, 9L), .Label = c("Fibroblast", "T cell", "Macrophage", "Stellate", "Acinar", "Endothelial", "Tumor", "B cell", "Mast cell", "Ductal", "Islets of Langerhans"), class = "factor")), row.names = c("treated_AAACGCTAGCGGGTTA-1", "treated_AAAGGTAAGTACAGAT-1", "treated_AAAGTGAGTTTGATCG-1", "treated_AAATGGACAAAGTGTA-1",
"treated_AACAAAGGTCGACTTA-1", "treated_AACAGGGTCCTAGCCT-1"), class = "data.frame")
output of dput(tail(comparison))
structure(list(orig.ident = c("untreated", "untreated", "untreated",
"untreated", "untreated", "untreated"), nCount_RNA = c(901, 823,
1184, 1835, 1147, 1407), nFeature_RNA = c(482L, 479L, 649L, 1043L,
604L, 709L), percent.mt = c(1.77580466148724, 2.91616038882138,
4.22297297297297, 3.86920980926431, 2.0052310374891, 4.05117270788913
), RNA_snn_res.0.5 = structure(c(7L, 7L, 7L, 14L, 7L, 7L), .Label = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12",
"13", "14", "15", "16", "17", "18", "19"), class = "factor"),
seurat_clusters = structure(c(7L, 7L, 7L, 14L, 7L, 7L), .Label = c("0",
"1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11",
"12", "13", "14", "15", "16", "17", "18", "19"), class = "factor"),
S.Score = c(-0.0320858200243315, 0.0304725660342869, 0.0215996091745327,
0.0384166213301423, 0.144956251122548, -0.0242770509986111
), G2M.Score = c(0.0904224391544142, 0.050148242050667, -0.0178041670730754,
-0.0112596867977946, -0.0519554524339088, -0.0136533184257381
), Phase = structure(c(2L, 2L, 3L, 3L, 3L, 1L), .Label = c("G1",
"G2M", "S"), class = "factor"), old.ident = structure(c(5L,
5L, 5L, 5L, 5L, 5L), .Label = c("Fibroblast", "T cell", "Macrophage",
"Stellate", "Acinar", "Endothelial", "Tumor", "B cell", "Mast cell",
"Ductal", "Islets of Langerhans"), class = "factor")), row.names = c("untreated_TTTGGTTGTCTAATCG-18",
"untreated_TTTGGTTTCCCGAGGT-18", "untreated_TTTGTTGAGAACTGAT-18",
"untreated_TTTGTTGAGCTCGGCT-18", "untreated_TTTGTTGAGTGCCTCG-18",
"untreated_TTTGTTGCACGGTGCT-18"), class = "data.frame")
Upvotes: 0
Views: 5841
Reputation: 16178
Without knowing the structure of your data, it's really hard to guess what will be the good code for your example.
however, if we assume that you have for each conditions, you have a list of individual cells, each with a particular label corresponding to their cell type such as in the following example:
set.seed(123)
Untreated <- data.frame(Cell_Type = sample(LETTERS[1:4],10, replace = TRUE))
Treated <- data.frame(Cell_Type =sample(LETTERS[1:4],25, replace = TRUE))
Cell_Type
1 C
2 C
3 C
4 B
5 C
6 B
... ...
You can use dplyr
to first bind_rows
:
library(dplyr)
Untreated <- Untreated %>% mutate(Condition = "Untreated")
Treated <- Treated %>% mutate(Condition = "Treated")
DF <- bind_rows(Untreated, Treated)
Cell_Type Condition
1 C Untreated
2 C Untreated
3 C Untreated
4 B Untreated
5 C Untreated
6 B Untreated
Then, you can count for the number of each cell type into each condition and express it as a percentage:
DF <- DF %>% group_by(Condition, Cell_Type) %>%
summarise(Nb = n()) %>%
mutate(C = sum(Nb)) %>%
mutate(percent = Nb/C*100)
# A tibble: 7 x 5
# Groups: Condition [2]
Condition Cell_Type Nb C percent
<chr> <chr> <int> <int> <dbl>
1 Treated A 7 25 28.
2 Treated B 7 25 28.
3 Treated C 6 25 24
4 Treated D 5 25 20
5 Untreated A 1 10 10
6 Untreated B 4 10 40
7 Untreated C 5 10 50
Then, you can plot the results a stacked barchart for each condition and filled each color according to the Cell_Type:
library(ggplot2)
ggplot(DF, aes(x = Condition, y = percent, fill = Cell_Type))+
geom_bar(stat = "identity")+
geom_text(aes(label = paste(percent,"%")), position = position_stack(vjust = 0.5))
EDIT: Plotting using data provided by the OP
Using the data you provided in your question, you can do:
df <- structure(c(6051L, 1892L, 1133L, 893L, 148L, 868L, 5331L, 3757L,
1802L, 1061L, 2786L, 704L), .Dim = c(6L, 2L), .Dimnames = structure(list(c("Fibroblast", "T cell", "Macrophage", "Stellate", "Acinar", "Endothelial"), c("treated", "untreated")), .Names = c("",
"")), class = "table")
df <- data.frame(df)
Which gives the following dataframe:
Var1 Var2 Freq
1 Fibroblast treated 6051
2 T cell treated 1892
3 Macrophage treated 1133
4 Stellate treated 893
5 Acinar treated 148
6 Endothelial treated 868
7 Fibroblast untreated 5331
8 T cell untreated 3757
9 Macrophage untreated 1802
10 Stellate untreated 1061
11 Acinar untreated 2786
12 Endothelial untreated 704
And then, you can rename your column, calculate the percent of each cell type for each condition:
library(dplyr)
DF <- df %>% rename(Cell_Type = Var1, Condition = Var2) %>%
group_by(Condition) %>%
mutate(Percent = Freq / sum(Freq)*100)
# A tibble: 12 x 4
# Groups: Condition [2]
Cell_Type Condition Freq Percent
<fct> <fct> <int> <dbl>
1 Fibroblast treated 6051 55.1
2 T cell treated 1892 17.2
3 Macrophage treated 1133 10.3
4 Stellate treated 893 8.13
5 Acinar treated 148 1.35
6 Endothelial treated 868 7.90
7 Fibroblast untreated 5331 34.5
8 T cell untreated 3757 24.3
9 Macrophage untreated 1802 11.7
10 Stellate untreated 1061 6.87
11 Acinar untreated 2786 18.0
12 Endothelial untreated 704 4.56
Then, for the plotting part:
library(ggplot2)
ggplot(DF, aes(x = Condition, y = Percent, fill = Cell_Type))+
geom_bar(stat = "identity")+
geom_text(aes(label = paste(round(Percent,2),"%")), position = position_stack(vjust = 0.5))
Does it answer your question ?
Upvotes: 1