Reputation: 27
I have written this rule in my snakefile
rule extractfeat:
input:
'/path/to/file/{genome}.gbk'
output:
'{genome}_{locus_tag}_{gene}_{substrate}.fasta'
shell:
'''
extractfeat {input} {output} -value {wildcards.genome}_{wildcards.locus_tag} -type CDS -describe product,locus_tag
'''
I want to get all the output files (by output files I mean file=row) that are written in a separate file that looks like this:
genome locus_tag gene substrate
PalbDSM11370 02121 susC pululan
PalbDSM11370 02122 susD pululan
PalbDSM11370 01210 susC arabinan
PalbDSM11370 01209 susD arabinan
PalbDSM11370 02015 susC bglukan
PalbDSM11370 02016 susD bglukan
PpalDSM17968 00934 susC pululan
PpalDSM17968 00933 susD pululan
PpalDSM17968 02229 susC arabinan
PpalDSM17968 02228 susD arabinan
PpalDSM17968 01622 susC bglukan
PpalDSM17968 01623 susD bglukan
PREVCOP 05864 susC pululan
PREVCOP 05865 susD pululan
PREVCOP 05852 susC arabinan
PREVCOP 05851 susD arabinan
PREVCOP 05099 susC bglukan
PREVCOP 05098 susD bglukan
PREVCOP 03646 susC ksiloglukan
PREVCOP 03645 susD ksiloglukan
Psp.AGR2160 00839 susC ksiloglukan
Thanks
Upvotes: 0
Views: 62
Reputation: 2079
You can add this as another rule, with inputs dependent on all the outputs of your generating rule:
rule tabulate:
input: <ALL THE FASTA FILES>
output: 'table.txt'
run:
wcs = glob_wildcards('{genome}_{locus_tag}_{gene}_{substrate}.fasta')
with open(output[0], 'w') as outfile:
outfile.write('genome\tlocus_tag\tgene\tsubstrate\n') # header
for row in zip(*wcs): # order will match order in wildcard string
outfile.write('\t'.join(row) + '\n')
Or if you already have lists of those wildcards you can write that directly instead of using glob_wildcards.
Upvotes: 1