Reputation: 1196
I try to setup a snakefile (snakemake 7.19.1). The final output should contain a timestamp. This is a minimal example:
#!/bin/python
print('import packages')
from datetime import datetime
import time
now = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
tarfile = now + '_stardist.tar'
print(tarfile)
rule all:
input: {tarfile}
rule main_rule:
input:
output: {tarfile}
run:
shell('touch ' + output[0])
If I run that snakefile with snakemake -s snakefile --cores 1
I get the following output resulting in an error message:
import packages
2025-02-04_11-33-38_stardist.tar
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count min threads max threads
--------- ------- ------------- -------------
all 1 1 1
main_rule 1 1 1
total 2 1 1
Select jobs to execute...
[Tue Feb 4 11:33:38 2025]
rule main_rule:
output: 2025-02-04_11-33-38_stardist.tar
jobid: 1
reason: Missing output files: 2025-02-04_11-33-38_stardist.tar
resources: tmpdir=/tmp
import packages
2025-02-04_11-33-39_stardist.tar
Building DAG of jobs...
Using shell: /usr/bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Select jobs to execute...
Waiting at most 5 seconds for missing files.
MissingOutputException in rule main_rule in file /lustre/projects/xxx/test_timestamp_minimal/snakefile, line 14:
Job 1 completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
2025-02-04_11-33-38_stardist.tar
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2025-02-04T113338.450326.snakemake.log
As it is now the name of tarfile
is created twice and the file which is finally created does not have the same timestamp like the file which is expected as output.
Can anybody please help?
Update
I created a fresh conda environment with python 3.12 and snakemake 8.27.1. I tested my code on a local Windows10 machine (miniforge) and on CentOS7 (the head node of an HPC environment). While there is no issue on my local machine, I get the same error on the HPC system.
This is the output on the local machine (I do not put again the output of the HPC-node as it's comparable to what I've already reported):
import packages
2025-02-05_13-13-27_stardist.tar
Assuming unrestricted shared filesystem usage.
host: screening-pc-4
Building DAG of jobs...
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Job stats:
job count
--------- -------
all 1
main_rule 1
total 2
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 5 13:13:27 2025]
localrule main_rule:
output: 2025-02-05_13-13-27_stardist.tar
jobid: 1
reason: Missing output files: 2025-02-05_13-13-27_stardist.tar
resources: tmpdir=C:\Users\xxx\AppData\Local\Temp
[Wed Feb 5 13:13:27 2025]
Finished job 1.
1 of 2 steps (50%) done
Select jobs to execute...
Execute 1 jobs...
[Wed Feb 5 13:13:27 2025]
localrule all:
input: 2025-02-05_13-13-27_stardist.tar
jobid: 0
reason: Input files updated by another job: 2025-02-05_13-13-27_stardist.tar
resources: tmpdir=C:\Users\xxx\AppData\Local\Temp
[Wed Feb 5 13:13:27 2025]
Finished job 0.
2 of 2 steps (100%) done
Complete log: .snakemake\log\2025-02-05T131327.214796.snakemake.log
What I do not understand: In the case of the Win10 environment (where it's working): Snakemake reports the following lines just once:
import packages
2025-02-04_14-14-23_stardist.tar
while when it goes wrong, the snakemake file seems to be run twice (once for each job)? So the first execution gets a different filename due to the execution time compared to the second job.
Upvotes: 0
Views: 68
Reputation: 9954
This is a current, known gotcha.
"python twice" as search terms in Snakemake's GitHub Issues page came up with the following still open (at this time) issue posted mid-2023 & labeled as a 'bug', although maybe that label is erroneous according to Tim Booth because of parallel job handling:
ISSUE: 'Snakefile is run a second time if rule contains run directive'.
Quote from stefmldk in that thread (linked specifically above):
"I can confirm this observation (version 7.32.4). Frustrating if you use time stamps in file names. It seems that a list of output files to be expected is generated in the first run only, but if a time stamp and thus file names get changed on the second run, the files expected by Snakemake never get created and a MissingOutputException is thrown after 5 seconds. Temporary solution that works most of the time: Exclude seconds from time stamps."
This was the exact issue and work-around we worked out via the comments in looking into this.
To highlight the current work-around that agrees with our findings:
"Temporary solution that works most of the time: Exclude seconds from time stamps."
Upvotes: 0
Reputation: 713
Despite what commenters have said in issue #2350, this phenomenon of Snakemake running the python code twice is a feature, not a bug, and it's not going to be "fixed". Even if the behaviour was changed for the simple -j1
case it will still manifest as soon as you allow parallel jobs or submit your jobs to a HPC cluster (this includes your fixed code, @antje-janosch!)
The actual bug in this code is on line 16. If you replace {tarfile}
with "{tarfile}"
(only on line 16, not line 12) in the original example you will find that it works, just by adding the two quotes.
The difference is this: {tarfile}
means "a python set containing the single fixed string from the variable tarfile", whereas "{tarfile}"
is a template string that will act as a placeholder for any filename. These are not the same!
Upvotes: 1
Reputation: 1196
Thanks to @Wayne!
According to the issue Snakefile is run a second time if rule contains run directive #2350 I decided to use "shell" instead of "run" in my main_rule. This prevents snakemake from running the snakefile code twice. My working snakefile now looks like this:
#!/bin/python
print('import packages')
from datetime import datetime
import time
now = datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
tarfile = now + '_stardist.tar'
print(tarfile)
rule all:
input: {tarfile}
rule main_rule:
input:
output: {tarfile}
shell:
"touch {tarfile}"
Upvotes: 0