Reputation: 1644
I am very new to OOP so this question may look very amatuer to experienced OOP people. I have a number of text files up to 250M lines long and I am planning to generate reports based on values in the columns of these files. The files look as follows:
chr1 54071 5 0 8 0
chr1 54072 5 0 9 0
chr1 54073 5 0 9 0
chr1 54074 5 0 9 0
chr1 54075 5 0 9 0
chr1 54076 5 0 9 0
chr1 54077 5 0 9 0
chr1 54078 5 0 9 0
chr1 54079 5 0 10 0
chr1 54080 5 0 10 0
chr1 54081 5 0 10 0
chr1 54082 5 0 10 0
chr1 54083 5 0 10 0
chr1 54084 5 0 10 0
chr1 54085 5 0 11 0
chr1 54086 5 0 11 0
chr1 54087 5 0 11 0
chr1 54088 5 0 11 0
chr1 54089 5 0 12 0
Where col1 is a chromosome, col2 is a position in the chromosome (from 1-250M), the remaining cols are samples and the value for each sample at a given position.
The function is supplied with 2 arguments, one is the file containing data such as in the example above, the other is a list of samples such as: ["AE","BE","HE","C"] in the order that they appear in the data file.
The report should generate a summary output for each sample and each combination of samples where the value of the sample col is larger than a given value, say '2' for instance. The report looks like:
Sample BasesCovered FractionOfTotal
AE 43954 0.43954
BE 18728 0.18728
HE 33780 0.3378
C 8108 0.08108
AE:BE 17576 0.17576
AE:HE 28818 0.28818
AE:C 7268 0.07268
BE:HE 13694 0.13694
BE:C 4349 0.04349
HE:C 4827 0.04827
AE:BE:HE 12873 0.12873
AE:BE:C 4263 0.04263
AE:HE:C 4634 0.04634
BE:HE:C 2831 0.02831
AE:BE:HE:C 2750 0.0275
TotalSize 100000 1.00
I have achieved this using a generator and functional programming but would like to learn OOP so am trying to implement this in OOP by making a 'report' object that gets updated with each yield of the generator. My functional code to initiate the report looks like this:
def initiate_overlap_dict(SAMPLE_LIST):
# Takes a list or a string and converts it into a dict of all combinations, initiates the value of the dict as integer 0
if len(SAMPLE_LIST)==1 and type(SAMPLE_LIST)==list:
return {SAMPLE_LIST[0]: 0}
elif len(SAMPLE_LIST)==0 and type(SAMPLE_LIST)==list:
raise Exception('"SAMPLE_LIST" needs to contain samples!')
elif type(SAMPLE_LIST) != list:
raise Exception('"SAMPLE_LIST" must be a list of length >=1 in the same order as they appear in the depth_file')
else:
sample_list=[str(x) for x in SAMPLE_LIST]
out={}
for s in sample_list:
out[s]=0
for c in range(2,len(sample_list)+1):
for s in combinations(sample_list,c):
out[':'.join(s)]=0
return out
I simply call this at the start of the program and then update it with each yield of the generator. I'd like to do something similar with OOP and have tried the following:
from itertools import combinations
class CoverageReport(object):
# CONSTRUCTOR
def __init__(self, samples):
self.samples = samples # list of samples
self.coverage = self.initiate_overlap_dict(self)
# REPRESENTATION METHOD: WHAT WILL BE PRINTED BY DEFAULT IF THE OBJECT IS CALLED
def __repr__(self):
return '<The following samples are examined for coverage: ' + self.samples +'>'
def initiate_overlap_dict(self):
# Takes a list or a string and converts it into a dict of all combinations, initiates the value of the dict as integer 0
if len(self.samples)==1 and type(self.samples)==list:
return {self.samples[0]: 0}
elif len(self.samples)==0 and type(self.samples)==list:
raise Exception('"SAMPLE_LIST" needs to contain samples!')
elif type(self.samples) != list:
raise Exception('"SAMPLE_LIST" must be a list of length >=1 in the same order as they appear in the depth_file')
else:
sample_list=[str(x) for x in self.samples]
out={}
for s in sample_list:
out[s]=0
for c in range(2,len(sample_list)+1):
for s in combinations(sample_list,c):
out[':'.join(s)]=0
return out
report=CoverageReport(["AE","BE","HE","C"])
Basically I'm trying to get the object to initiate itself with values of 0 for each item and combination of items in the list so that I can then make an update method that will update for each iteration of the generator. Its throwing the following error:
TypeError: initiate_overlap_dict() takes 1 positional argument but 2 were given
I figure this is something to do with trying to initiate self.coverage in the init and not giving an arguement to create the object - is there a way to do this using the list (self.samples)? As this should be all it needs to initiate an empty/unpopulated report?
Is there a way to do this? I'm sure someone with even basic OOP skills can answer this fairly easily? I'm a bit stumped with what exactly to search for is all. Many thanks
Upvotes: 0
Views: 24
Reputation: 703
You don't need to call self.initiate_overlap_dict(self)
, you only need to call self.initiate_overlap_dict()
.
i.e. remove the self
argument
You can read more here https://pythontips.com/2013/08/07/the-self-variable-in-python-explained/
Upvotes: 1