Peter Chung
Peter Chung

Reputation: 1122

snakemake PICARD merge bam files

I am new in using snakemake, I have an issue when using PICARD MergeSamFiles to merge bam files into one bam files. I would like to merge 1_sorted.bam 2_sorted.bam ...10_sorted.bam into one bam file with directory name.

import snakemake.io 
import os.path

PICARD="/data/src/picard.jar"
(SAMPLES,)=glob_wildcards("bam/{sample}_sorted.bam")
NAME=os.path.dirname

def bam_inputs(wildcards):
    files = expand("bam/{sample}_sorted.bam", sample=SAMPLES)
    INPUT = "I="+files 
    return INPUT

rule all:
    input: "bam/{NAME}.bam"

rule merge_bams:
    input: bam_inputs
    output: "bam/{NAME}.bam"
    params: mrkdup_jar="/data/src/picard.jar"
    shell: "java -Xmx16G -jar {params.mrkdup_jar} MergeSamFiles \
    {input} \
    O={output} \
    SORT_ORDER=coordinate \
    ASSUME_SORTED=false \
    USE_THREADING=true"

Error:

Building DAG of jobs...
WildcardError in line 12 of /data/data/Samples/snakemake-example/WGS-test/step3.smk:
Wildcards in input files cannot be determined from output files:
'NAME'

I don't know how to merge all bam files into one and don't know how to set the directory name as a variable to the final bam file. Please advice.

UPDATE:

import snakemake.io

PICARD="/data/src/picard.jar"
(SAMPLES,)=glob_wildcards("bam/{sample}_sorted.bam")
#NAME=os.path.dirname
NAME="test"

rule all:
    input: "bam/{name}.bam".format(name=NAME)

rule merge_bams:
    input: expand("bam/{sample}_sorted.bam",sample=SAMPLES)
    output: "bam/{name}.bam".format(name=NAME)
    params: mrkdup_jar="/data/src/picard.jar"
    shell: """java -Xmx16G -jar {params.mrkdup_jar} MergeSamFiles \
    {"I=" + input} \
    O={output} \
    SORT_ORDER=coordinate \
    ASSUME_SORTED=false \
    USE_THREADING=true """

ERROR:

RuleException in line 11 of /data/data/Samples/snakemake-example/WGS-test/step3.smk:
NameError: The name '"I=" + input' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}

MergeSamFiles \
I= sub1_sorted.bam I=sub2_sorted.bam I=sub3_sorted.bam \
O= sub.bam \
SORT_ORDER=coordinate \
        ASSUME_SORTED=false \
        USE_THREADING=true

Upvotes: 1

Views: 520

Answers (1)

Dmitry Kuzminov
Dmitry Kuzminov

Reputation: 6584

Let's regard the rule all. You need to show the snakemake what file you actually expect to build as a target. No wildcards: just something unambiguous. You said it should be a bam file with directory name?

rule all:
    input: f"bam/{NAME}.bam"

Note that using the f-string I converted the {NAME} from a wildcard into an exact string value that comes from the variable NAME. You may choose any other way to do that, i.e. "bam/{name}.bam".format(name=NAME)

Next, keep in mind that now {NAME} in the "all" rule and {NAME} in the "merge_bams" rule are different entities, so they have nothing in common. Moreover, the wildcard doesn't necessary equal to the NAME variable that you defined on the line 6. I would call the wildcard somehow differently to avoid misunderstanding.

One more thing: I'm not sure what you are doing in the bam_inputs function:

INPUT = "I="+files 

The result of the expand function should be enough to specify the input for the merge_bams rule. If you need to add "I=" for every file in the list, try to do it right in the shell: section:

rule merge_bams:
    input: bam_inputs
    output: "bam/{NAME}.bam"
    params: mrkdup_jar="/data/src/picard.jar"
    shell: f"""java -Xmx16G -jar {{params.mrkdup_jar}} MergeSamFiles 
        {" ".join(["I=" + s for s in input])} 
        O={{output}} 
        SORT_ORDER=coordinate 
        ASSUME_SORTED=false 
        USE_THREADING=true"""

Upvotes: 2

Related Questions