Reputation: 846
I have tsv files all of them with one column and same number of rows.
I am plotting them using ggplot (stat_smooth) but I would the program to be flexible meaning adding more stat_smooth
function calls, based on how many files are provided as input.
The length of the input is taken from length(commandArgs(TRUE))
and I am storing my data in a variable as
cov=data.frame(sapply(1:length(commandArgs(TRUE)),
function(i)read.csv(proteins[i],sep='\t',colClasses=c(NA,"NULL"))))
where proteins<-commandArgs(TRUE)
are the files and I am adding the colnames
using another code.
Now, the problem comes with ggplot, how can I make calls to ggplot to make smooth_line
calls on the fly depending on the number of arguments provided.
I was trying somthing like,
m=ggplot(cov,aes(seq,cov[,2]))
p=function(i){return(stat_smooth(aes(color=colnames(cov)[i])))}
m+p(1)+.....
adding the p
to core ggplot plot initiator m
using a for loop
but that doesn't seems to makes sense.
There should be a more efficient way of this. The idea would be construct the calls, based on the columns in the cov
data.frame which has data like
seq fileA fileB
1 8429.262 8606.623
2 8766.138 9066.361
3 9081.893 9456.915
4 9342.380 9784.373
5 9480.860 10067.121
6 9581.437 10312.253
Can someone suggest something?
Upvotes: 0
Views: 88
Reputation: 81733
Firtly, reshape your data to the long format with one value per row.
library(reshape2)
covM <- melt(cov, id.var = "seq")
This returns the following data frame:
seq variable value
1 1 fileA 8429.262
2 2 fileA 8766.138
3 3 fileA 9081.893
4 4 fileA 9342.380
5 5 fileA 9480.860
6 6 fileA 9581.437
7 1 fileB 8606.623
8 2 fileB 9066.361
9 3 fileB 9456.915
10 4 fileB 9784.373
11 5 fileB 10067.121
12 6 fileB 10312.253
Once you have the new object, it's easy to plot:
library(ggplot2)
ggplot(covM, aes(seq, value)) +
stat_smooth(aes(color = variable))
Upvotes: 1