Reputation: 443
I Have a simple python function that take an input and create an output
def enlarge_overlapping_region(input,output):
fi=open(input,"r")
fo=open(output,"w")
df = pd.read_table(fi, delimiter='\t',header=None,names=["chr","start","end","point","score","strand","cdna_count","lib_count","region_type","region_id"])
df1 = (df.groupby('region_id', as_index=False)
.agg({'chr':'first', 'start':'min', 'end':'max','region_type':'first'})
[['chr','start','end','region_type','region_id']])
df1 = df1[df1.region_id != "."]
df1.to_csv(fo,index=False, sep='\t')
return(df1)
I call this function in a rule snakemake. But I cannot access to the file I don't know why.
I tried something like that :
rule get_enlarged_dhs:
input:
"data/annotated_clones/{cdna}_paste_{lib}.annotated.bed"
output:
"data/enlarged_coordinates/{cdna}/{cdna}_paste_{lib}.enlarged_dhs.bed"
run:
lambda wildcards: enlarge_overlapping_region(f"{wildcards.input}",f"{wildcards.output}")
I got this error :
Missing files after 5 seconds:
data/enlarged_coordinates/pPGK_rep1/pPGK_rep1_paste_pPGK_input.enlarged_dhs.bed
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wa
it.
If I put directly the python code into the rule like thath :
rule get_enlarged_dhs:
input:
"data/annotated_clones/{cdna}_paste_{lib}.annotated.bed"
output:
"data/enlarged_coordinates/{cdna}/{cdna}_paste_{lib}.enlarged_dhs.bed"
run:
fi=open(input,"r")
fo=open(output,"w")
df = pd.read_table(fi, delimiter='\t',header=None,names=["chr","start","end","point","score","strand","cdna_count","lib_count","region_type","region_id"])
df1 = (df.groupby('region_id', as_index=False)
.agg({'chr':'first', 'start':'min', 'end':'max','region_type':'first'})
[['chr','start','end','region_type','region_id']])
df1 = df1[df1.region_id != "."]
df1.to_csv(fo,index=False, sep='\t')
I got this error :
expected str, bytes or os.PathLike object, not InputFiles
Upvotes: 1
Views: 176
Reputation: 9062
It's simpler than you think, probably:
lambda wildcards: enlarge_overlapping_region(f"{wildcards.input}",f"{wildcards.output}")
Should be:
enlarge_overlapping_region(input[0], output[0])
Similarly, to fix the second solution you tried change:
fi=open(input,"r")
fo=open(output,"w")
to
fi=open(input[0],"r")
fo=open(output[0],"w")
In my opinion, it's less error-prone to assign a name to input and output files and use that name in the run
or shell
directives. E.g.
rule get_enlarged_dhs:
input:
bed= "...",
output:
bed= "...",
run:
enlarge_overlapping_region(input.bed, output.bed)
Upvotes: 3