Reputation: 557
I have issues with the following snakemake pipeline. I get en error after a dry run of the Snakefile:
Building DAG of jobs... WorkflowError: Target rules may not contain wildcards. Please specify concrete files or a rule without wildcards at the command line, or have a rule without wildcards at the very top of your workflow (e.g. the typical "rule all" which just collects all results you want to generate in the end)
I did many things in the rule all part, including listing manually the required files and thus avoiding wildcards. It did not help. I uninstalled the old version and installed the latest one of snakemake. It did not produce any effect.
Here is the snakefile. Any help will be highly appreciated:
import os
import pandas as pd
# Read beta and b combinations from CSV
beta_b_values = []
with open("beta_b_combinations.csv", "r") as f:
next(f) # Skip header
for line in f:
beta, b = line.strip().split(",")
safe_beta = beta.replace(".", "_")
safe_b = b.replace(".", "_")
beta_b_values.append((safe_beta, safe_b))
# Print beta_b_values after defining it for debugging
print("Loaded beta_b_values:", beta_b_values)
# Define paths dynamically
def get_folder(beta, b):
return f"beta_{beta}_b_{b}"
def get_data_folder(beta, b):
return f"{get_folder(beta, b)}/data_1_first500"
# Step 1: Run C++ Simulations via Bash Script
rule run_simulations:
output:
"{folder}/data_1_first500/replica_{i}.csv"
params:
executable="metropolis_extended"
shell:
"""
set -e # Stop script on any error
bash run_metropolis_extended.sh {params.executable} {wildcards.folder} {wildcards.i} {output}
"""
# Step 2: Merge CSV Files After Simulations
rule merge_replicas:
input:
"simulations_done.flag",
expand("{folder}/data_1_first500/replica_{i}.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values], i=range(1, 501))
output:
expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
python merge_files.py --folder "$folder/data_1_first500" --output "$folder/merged_replicas.csv"
done
"""
# Step 3: Compute Means After Merging
rule compute_means:
input:
expand("{folder}/merged_replicas.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
python merged_replicas_with_means.py --input "$folder/merged_replicas.csv" --output "$folder/merged_replicas_with_means.csv"
done
"""
# Step 4: Generate Plots
rule generate_plots:
input:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/plots_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
Rscript plot_results.R --output "$folder"
touch "$folder/plots_done.flag"
done
"""
# Step 5: Compute Thermalized Averages
rule compute_thermalized_averages:
input:
expand("{folder}/merged_replicas_with_means.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values]),
expand("{folder}/thermalized_averages_done.flag", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
shell:
"""
for folder in {{" ".join([get_folder(b[0], b[1]) for b in beta_b_values])}}; do
Rscript thermalized_quantities.R --input "$folder/merged_replicas_with_means.csv" --output "$folder/thermalized_averages.csv"
touch "$folder/thermalized_averages_done.flag"
done
"""
# Step 6: Compute Errors via Jupyter Notebook
rule compute_errors:
input:
expand("{folder}/thermalized_averages.csv", folder=[get_folder(b[0], b[1]) for b in beta_b_values])
output:
["errors_computed.flag", "beta_b_and_means_with_errors.csv"]
shell:
"""
papermill computing_errors.ipynb computing_errors_output.ipynb
touch errors_computed.flag
"""
# Step 7: Collect all results into a Single CSV file and generate Final Plots
rule generate_final_plots:
input:
"beta_b_and_means_with_errors.csv"
output:
"errors_all_beta_b_combinations.csv",
"magnetization_plot.png",
"hamiltonian_plot.png"
shell:
"""
python collect_and_plot_errors.py
"""
# Precompute the list of plots_done.flag files
plots_done_files = [f"{get_folder(beta, b)}/plots_done.flag" for beta, b in beta_b_values]
# Final Rule: Defines Overall Workflow Goal
rule all:
input:
# Explicitly list all plots_done.flag files
plots_done_files,
"errors_all_beta_b_combinations.csv",
"magnetization_plot.png",
"hamiltonian_plot.png"
Upvotes: 0
Views: 42
Reputation: 713
The answer from @kEks above is the solution to your question.
However, I'd further suggest that you are missing the point of Snakemake rules having those four rules that all contain shell loops. Is there any reason why you are making a rule that triggers once and runs every command in a loop, rather than having a rule that applies to any individual output file and letting Snakemake do the work? Your code would be considerably simpler if you wrote it this way. You could also then use the touch()
output type of Snakemake to save you explicitly running the touch ...
command in the shell part.
Upvotes: 2
Reputation: 588
By default, Snakemake sees the first rule as its target. Thus, if you just run something like snakemake -n
, it tries to solve for rule run_simulations
. If you move rule all
above rule run_simulations
, it should work
Upvotes: 2