Reputation: 301
I have a rule that requires a folder as an input. Problem is rule merge_fastqs uses merge folder to pull fastqs from different lanes into one big fastq per sample. But rule cellranger_count kicks off almost the same time as merged folder is created. Cellranger_count errors as there isnt anything in the folder. Can I use touch or some other method to hold Cellranger_count from proceeding until merge_fastqs is done?
rule all:
input: completeflag
# use a function to identify input fastqs to circumvent barcodes irregularities
def input_fastq(wildcards):
fnames = glob.glob(config['fq_glob'] %wildcards.sampleID) # the %s wildcard is already in the config string
return sorted(fnames) # make sure R1 is first
# Rule used to get data together and named for CellRanger
rule merge_fastqs:
input: input_fastq
output:
'merged/{sampleID}_S1_L001_R1_001.fastq.gz',
'merged/{sampleID}_S1_L001_R2_001.fastq.gz'
threads: 4
params:
r1 = config['pair_id'][0],
r2 = config['pair_id'][1],
run:
r1 = [x for x in input if params.r1 in x]
r2 = [x for x in input if params.r2 in x]
shell('cat %s > {output[0]}' %' '.join(r1))
shell('cat %s > {output[1]}' %' '.join(r2))
# make sure cell ranger module is loaded #module load cellranger/6.1.2
# all necessary tools need to be in scRNAseq reference folder
rule cellranger_count:
input:
'merged'
output:
maxtrix_h5 = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix.h5',
metrics = '{sampleID}_TenXAnalysis/outs/metrics_summary.csv',
dir = directory('{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix'),
barcodes = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/barcodes.tsv.gz',
features = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/features.tsv.gz',
matrix = '{sampleID}_TenXAnalysis/outs/raw_feature_bc_matrix/matrix.mtx.gz',
html = '{sampleID}_TenXAnalysis/outs/web_summary.html',
threads: 16
params:
# This needs to be fixed to a location
ref = '/PATH/refdata-gex-GRCh38-2020-A',
# Commented out for now
#sample_id = '{sampleID}_merged'
## id = unique run ID string
## fastqs = Path to data
## sample = Sample names as specified in the sample sheet
## transcriptome = Path to Cell Ranger compatible transcritpome reference
## localcores = tells cellragner how many cores to use
## localmem = how much mem to use
shell: """
rm -rf {wildcards.sampleID}_TenXAnalysis
cellranger count --id={wildcards.sampleID}_TenXAnalysis \
--fastqs={input} \
--sample={wildcards.sampleID} \
--transcriptome={params.ref} \
--localcores={threads} \
--localmem=128
"""
Upvotes: 1
Views: 105
Reputation: 16571
One option is to wait for the output files of the previous rule to be created (rather than folder):
rule cellranger_count:
input:
folder='merged',
files=rules.merge_fastqs.output
# skipping details in the original rule
shell:
"""
cellranger count --id={wildcards.sampleID}_TenXAnalysis \
--fastqs={input.folder} \
# skipping details in the original rule
"""
Upvotes: 2