Mark
Mark

Reputation: 385

python pickle: dump only unique list to pickled file

I have a function that generates a list of gene names for each sample. I want to save this list and reuse it some other time. However, when dumping into the pickled file, I like to read the pickle file first, and only select to dump only the genes that are not in the pickle files. I dont want my pickle file to contain repetitive gene list as throughout time, it will create a huge pickle file.

e.g.

If my pickled file already contains the following genes: 'a', 'ab', 'ac' and the list I have newly created is:

unique_genes_list = ["a", "ab", "ac", "ad"]

Then I only want to dump the 'ad' to pickle. Any nice way of doing this?

Thanks

Upvotes: 0

Views: 995

Answers (1)

TayTay
TayTay

Reputation: 7170

If your goal is to add the new gene, 'ad', to the existing list of genes, here is how you can read in your old data from pickle, add the new gene and re-pickle:

import pickle

unique_genes_list = ["a", "ab", "ac", "ad"]
with open('some/path', 'rb') as in_pickle:
    old_data = pickle.load(in_pickle) ## ["a", "ab", "ac"]

## Adds 'ad' and any other new data
old_data.extend([x for x in unique_genes_list if not x in old_data]) 

## Save the new combined data
with open('some/path', 'wb') as out_pickle: 
    pickle.dump(old_data, out_pickle) ## dumps ["a", "ab", "ac", "ad"] overwriting old pickle

Edit:

If you're hoping to only retrieve the uniques and just pickle ['ad'], then here's how you would do that

import pickle

unique_genes_list = ["a", "ab", "ac", "ad"]
with open('some/path', 'rb') as in_pickle:
    old_data = pickle.load(in_pickle) ## ["a", "ab", "ac"]

## Gets just 'ad' or any other unique in a new list
new_genes = [x for x in unique_genes_list if not x in old_data] ##['ad']

## Save the new unique data
with open('some/new/path', 'wb') as out_pickle: 
    pickle.dump(new_genes, out_pickle) ## dumps ["ad"]

Upvotes: 1

Related Questions