Reputation: 1189
I have the feeling this must have been asked before but I might lack the vocabulary to search and describe my problem.
I have made a Python3 class that takes a directory as input and scrapes a lot of data together into a pandas.DataFrame, such that I can do this:
mymodule.myclass('/some/dir').get_tpm_values()
And get a pd.DataFrame with a some columns and rows, like this:
>>> seqit.Seqrun(41).get_tpm_values()
0041_P2017BB2S5R_S1 0041_P2017BB2S3R_S2 0041_P2017BB2S4R_S3 0041_P2017BB2S8R_S4 0041_P2017BB5S10R_S5
gene_id
ENSG00000000003 53.72 19.31 11.03 33.35 14.55
ENSG00000000005 1.05 0.34 0.19 0.84 0.12
ENSG00000000419 13.35 12.66 11.93 17.61 22.82
Now this DataFrame is a special DataFrame, it always contains genes on in the index and samples as the columns. As such I can make attributes that act on the returned DataFrame that won't act on any DataFrame. i.e., I'd like to be able to add Hugo symbols to the index like this and save to Excel:
mymodule.myclass('/some/dir').get_tpm_values().add_hugo_symbols_to_index().to_excel('some_excel.xlsx')
This would mean I need to add attributes to Pandas but only inside my class, how would I do that?
Edit, it may be helpfull to post part of my class
class Myclass():
"""
A class that gives one a handle on a Snakemake sequencing data analysis
folder
"""
def __init__(self, seqrun_dir):
if isinstance(seqrun_dir, int):
self.seqrun_dir = self.number2seqrun(seqrun_dir)
else:
self.seqrun_dir = seqrun_dir
self.name = os.path.split(self.seqrun_dir)[-1]
self.quantification_data_loaded = False
self.pctpm_values_loaded = False
self.load_sample_table()
def get_tpm_values(self):
"""
Get a pd.DataFrame with the TPM values from loaded quantification_data dictionary
"""
if not self.quantification_data_loaded:
self.get_quantification_data()
self.tpm_values = dict()
for sample in self.samples:
try:
self.tpm_values[sample] = self.quantification_data[sample]['TPM']
except KeyError:
print('Filling column', sample, 'with NaNs')
self.tpm_values[sample] = np.nan
self.tpm_values = pd.DataFrame(self.tpm_values)
self.tpm_values_loaded = True
return self.tpm_values
Upvotes: 0
Views: 1456
Reputation: 2590
If I understand your question correctly, you want to add a method to the DataFrame-class. A reference for this can be found at here
In my opinion, the best way to solve this is to create your own DataFrame-class, that inherits from pandas.DataFrame and implements an additional function. See code below for examaple:
class HugoDataFrame(pd.DataFrame):
def add_hugo_symbols_to_index():
pass # Do your stuff here
And then in instead of creating a DataFrame and returning, you should create a HugoDataFrame according to:
self.tpm_values = HugoDataFrame(self.tpm_values)
Your other option would be to simply export this functionality to a seperate function which takes a dataframe and modifies it
mymodule.myclass('/some/dir').get_tpm_values().add_hugo_symbols_to_index().to_excel('some_excel.xlsx')
you call
add_hugo_symbols_to_index(mymodule.myclass('/some/dir').get_tpm_values()).to_excel('some_excel.xlsx')
Upvotes: 1