String indices error in snakefile when switching from JSON to yaml config file

Question

I wrote a simple ChIP-seq pipeline in Snakemake using a JSON formatted config file and the dry-run ran as expected. After further reading on best practices, I switched to a yaml formatted config file and made what I thought were the appropriate changes, but now I'm getting a "string indices must be integers error".

The pipeline runs Trimmomatic, FastQC, Bowtie2 and MACS2 using wrappers as much as possible. I'm including only the Trimmomatic and FastQC code for simplicity since I think the issue is reading the config file. The config file contains samples (three csv files), directories (to create a consistent directory structure), an outfile base name, and sequence data (genome, etc).

config.yaml

---

samples:
  sample_names:samples.csv
  sample_files:files.csv
  sample_comparisons:comps.csv
directories:
  base_dir: /base/
  sample_dir: Samples/
  seq_dir: Raw_Sequences/
  trim_dir: Sequences/
  aln_dir: Alignments/
  peak_dir: Peak_Calling/
  logs_dir: Logs/
out_base: base
ref_seq_data:
  genome:
  bt2_index:

...

Snakefile

import pandas as pd

shell.prefix("set -euo pipefail; ")

configfile: "config.yaml"

sample_names = config["samples"]["sample_names"]
sample_files = config["samples"]["sample_files"]
sample_comparisons = config["samples"]["sample_comparisons"]
base_dir = config["directories"]["base_dir"]
sample_dir = config["directories"]["base_dir"]+config["directories"]["sample_dir"]
seq_dir = config["directories"]["seq_dir"]
trim_dir = config["directories"]["sample_dir"]+config["directories"]["trim_dir"]
aln_dir = config["directories"]["sample_dir"]+config["directories"]["aln_dir"]
peak_dir = config["directories"]["sample_dir"]+config["directories"]["peak_dir"]
log_dir = config["directories"]["sample_dir"]+config["directories"]["logs_dir"]
genome = config["ref_seq_data"]["genome"]
out_base = config["out_base"]

samples = pd.read_csv(sample_names, index_col="sample")
files = pd.read_csv(sample_files, index_col = "sample")
comparisons = pd.read_csv(sample_comparisons)

rule all:
  input:
    expand(log_dir+"{sample}_{read}_fastqc.html", sample = samples, read = [1,2])

# Load Rules

include: "Snakemake_rules/NGS_QC.smk"

The error message I receive is:

TypeError in line 7 of Snakefile: string indices must be integers

When using the JSON formatted config file, I didn't have it broken up into groups (each line was standalone) and when calling these lines with config[], it correctly assigned the proper values.

Most of the discussion I've seen about this involves iteration, so I'm not sure why the error is occurring here when using the yaml formatted file.

String indices error in snakefile when switching from JSON to yaml config file

Answers (1)

Related Questions