Reputation: 385
I have a function that generates a list of gene names for each sample. I want to save this list and reuse it some other time. However, when dumping into the pickled file, I like to read the pickle file first, and only select to dump only the genes that are not in the pickle files. I dont want my pickle file to contain repetitive gene list as throughout time, it will create a huge pickle file.
e.g.
If my pickled file already contains the following genes: 'a', 'ab', 'ac' and the list I have newly created is:
unique_genes_list = ["a", "ab", "ac", "ad"]
Then I only want to dump the 'ad' to pickle. Any nice way of doing this?
Thanks
Upvotes: 0
Views: 995
Reputation: 7170
If your goal is to add the new gene, 'ad'
, to the existing list of genes, here is how you can read in your old data from pickle, add the new gene and re-pickle:
import pickle
unique_genes_list = ["a", "ab", "ac", "ad"]
with open('some/path', 'rb') as in_pickle:
old_data = pickle.load(in_pickle) ## ["a", "ab", "ac"]
## Adds 'ad' and any other new data
old_data.extend([x for x in unique_genes_list if not x in old_data])
## Save the new combined data
with open('some/path', 'wb') as out_pickle:
pickle.dump(old_data, out_pickle) ## dumps ["a", "ab", "ac", "ad"] overwriting old pickle
If you're hoping to only retrieve the uniques and just pickle ['ad']
, then here's how you would do that
import pickle
unique_genes_list = ["a", "ab", "ac", "ad"]
with open('some/path', 'rb') as in_pickle:
old_data = pickle.load(in_pickle) ## ["a", "ab", "ac"]
## Gets just 'ad' or any other unique in a new list
new_genes = [x for x in unique_genes_list if not x in old_data] ##['ad']
## Save the new unique data
with open('some/new/path', 'wb') as out_pickle:
pickle.dump(new_genes, out_pickle) ## dumps ["ad"]
Upvotes: 1