Reputation: 345
I am writing a GNUmakefile to create a workflow to analyse some biological sequence data. The data comes in a format called fastq, which then undergoes a number of cleaning and analysis tools. I have attached what I currently have written, which takes me all the way from quality control before cleaning and then quality control afterwards. My problem is that I'm not sure how to get the 'fastqc' commands to run, as its targets are not dependencies for any of the other steps in the workflow.
%_sts_fastqc.html %_sts_fastqc.zip: %_sts.fastq
# perform quality control after cleaning reads
fastqc $^
%_sts.fastq: %_st.fastq
# trim reads based on quality
sickle se -f $^ -t illumina -o $@
%_st.fastq: %_s.fastq
# remove contaminated reads
tagdust -s adapters.fa $^
%_s.fastq: %.fastq
# trim adapters
scythe -a <adapters.fa> -o $@ $^
%_fastqc.html %_fastqc.zip: %.fastq
# perform quality control before cleaning reads
fastqc $^
%.fastq: %.sra
# convert .fastq to .sra
fastq-dump $^
Upvotes: 0
Views: 1081
Reputation: 151391
I believe adding these lines to the start of your Makefile will do what you are asking for:
SOURCES:=$(wildcard *.sra)
TARGETS:=$(SOURCES:.sra=_fastqc.html) $(SOURCES:.sra=_fastqc.zip)\
$(SOURCES:.sra=_sts_fastqc.html) $(SOURCES:.sra=_sts_fastqc.zip)
.PHONY: all
all: $(TARGETS)
What this does is grab all .sra
files from the file system and build a list of targets to build by replacing the extension with whatever strings are necessary to produce the targets. (Note the the html
and zip
targets being produced by the same command I could have one or the other but I've decided to put both, in case the rules change and the hmtl
and zip
targets are ever produced separately.) Then it sets the phony all
target to build all the computed targets. Here is a Makefile I've modified from yours by adding @echo
everywhere which I used to check that things were okay without having to run the actual commands in your Makefile. You could copy and paste it in a file to first check that everything is fine before modifying your own Makefile with the lines above. Here it is:
SOURCES:=$(wildcard *.sra)
TARGETS:=$(SOURCES:.sra=_fastqc.html) $(SOURCES:.sra=_fastqc.zip)\
$(SOURCES:.sra=_sts_fastqc.html) $(SOURCES:.sra=_sts_fastqc.zip)
.PHONY: all
all: $(TARGETS)
%_sts_fastqc.html %_sts_fastqc.zip: %_sts.fastq
# perform quality control after cleaning reads
@echo fastqc $^
%_sts.fastq: %_st.fastq
# trim reads based on quality
@echo sickle se -f $^ -t illumina -o $@
%_st.fastq: %_s.fastq
# remove contaminated reads
@echo tagdust -s adapters.fa $^
%_s.fastq: %.fastq
# trim adapters
@echo 'scythe -a <adapters.fa> -o $@ $^'
%_fastqc.html %_fastqc.zip: %.fastq
# perform quality control before cleaning reads
@echo fastqc $^
%.fastq: %.sra
# convert .fastq to .sra
@echo fastq-dump $^
I tested it here by running touch a.sra b.sra
and then running make
. It ran the commands for both files.
Upvotes: 1
Reputation: 35246
instead of using patterns, I would use a 'define':
# 'all' is not a file
.PHONY: all
# a list of 4 samples
SAMPLES=S1 S2 S3 S4
#define a macro named analyzefastq. It takes one argument $(1). we need to protect the '$' for later expension using $(eval)
define analyzefastq
# create a .st.fastq from fastq for file $(1)
$(1).st.fastq : $(1).fastq
tagdust -s adapters.fa $$^
# create a .fastq from seq for file $(1)
$(1).fastq : $(1).sra
fastq-dump $$^
endef
#all : final target dependency is all samples with a suffix '.st.fastq'
all: $(addsuffix ${S}.st.fastq, ${SAMPLES} )
## loop over each sample , name of variable is 'S' call and eval the previous macro, using 'S'=sample for the argument
$(foreach S,${SAMPLES},$(eval $(call analyzefastq,$(S))) )
I also use my tool jsvelocity https://github.com/lindenb/jsvelocity to generate large Makefile for NGS:
https://gist.github.com/lindenb/3c07ca722f793cc5dd60
Upvotes: 0