glitterbox
glitterbox

Reputation: 43

Accessing information from a dictionary

I have a dictionary and this is the first part (it's very large)

           {'cluster-1': {'BGCid': '-',
           'cdhitclusters': [{'genes': {'AT1G24070': 100.0},
                              'rep_gene': 'AT1G24070'},
                             {'genes': {'AT1G24100': 100.0},
                              'rep_gene': 'AT1G24100'},
                             {'genes': {'AT1G24040': 100.0,
                                        'AT1G2404_1': 100.0,
                                        'AT1G2404_2': 100.0},
                              'rep_gene': 'AT1G24040'},
                             {'genes': {'AT1G24020': 100.0,
                                        'AT1G2402_1': 100.0},
                              'rep_gene': 'AT1G24020'},
                             {'genes': {'AT1G24010': 100.0},
                              'rep_gene': 'AT1G24010'},
                             {'genes': {'AT1G24000': 100.0},
                              'rep_gene': 'AT1G24000'}],

I want to print the information held by the key(?) 'rep_gene'. But it says rep_gene isn't a key. What is rep_gene and how can I make a dataframe holding the rep_gene information?

EDIT

The first 2 lines work but the final one returns: AttributeError: 'list' object has no attribute 'get'

clus1 = (gene_clusters.get("cluster-1"))
cdhit1 = (clus1.get("cdhitclusters"))
cdhit1.get("rep_gene")

Upvotes: 0

Views: 31

Answers (2)

jsmart
jsmart

Reputation: 3001

Here is a way using Counter from the built-in collections package:

# use list-of-dict from above
cd_hit_clusters = [
{'genes': {'AT1G24070': 100.0}, 'rep_gene': 'AT1G24070'}, 
{'genes': {'AT1G24100': 100.0}, 'rep_gene': 'AT1G24100'}, 
{'genes': {'AT1G24040': 100.0, 'AT1G2404_1': 100.0, 'AT1G2404_2': 100.0}, 'rep_gene': 'AT1G24040'}, 
{'genes': {'AT1G24020': 100.0, 'AT1G2402_1': 100.0}, 'rep_gene': 'AT1G24020'}, 
{'genes': {'AT1G24010': 100.0}, 'rep_gene': 'AT1G24010'}, 
{'genes': {'AT1G24000': 100.0}, 'rep_gene': 'AT1G24000'}
]

Now use Counter:

from collections import Counter
rep_gene_list = [ cd['rep_gene'] for cd in cd_hit_clusters ]
Counter(rep_gene_list)

# results
Counter({'AT1G24070': 1,
         'AT1G24100': 1,
         'AT1G24040': 1,
         'AT1G24020': 1,
         'AT1G24010': 1,
         'AT1G24000': 1})

Upvotes: 1

DownloadPizza
DownloadPizza

Reputation: 3466

Your cdhit1 contains the following list:

[
   {
      "genes":{
         "AT1G24070":100.0
      },
      "rep_gene":"AT1G24070"
   },
   {
      "genes":{
         "AT1G24100":100.0
      },
      "rep_gene":"AT1G24100"
   },
   {
      "genes":{
         "AT1G24040":100.0,
         "AT1G2404_1":100.0,
         "AT1G2404_2":100.0
      },
      "rep_gene":"AT1G24040"
   },
   {
      "genes":{
         "AT1G24020":100.0,
         "AT1G2402_1":100.0
      },
      "rep_gene":"AT1G24020"
   },
   {
      "genes":{
         "AT1G24010":100.0
      },
      "rep_gene":"AT1G24010"
   },
   {
      "genes":{
         "AT1G24000":100.0
      },
      "rep_gene":"AT1G24000"
   }
]

So you need to specify which index you want to use. I have never worked with pandas, but maybe try cdhit1[0] and see what it returns. As you may notice you have multiple elements with "rep_gene" as key.

Upvotes: 1

Related Questions