Reputation: 43
I have a dictionary and this is the first part (it's very large)
{'cluster-1': {'BGCid': '-',
'cdhitclusters': [{'genes': {'AT1G24070': 100.0},
'rep_gene': 'AT1G24070'},
{'genes': {'AT1G24100': 100.0},
'rep_gene': 'AT1G24100'},
{'genes': {'AT1G24040': 100.0,
'AT1G2404_1': 100.0,
'AT1G2404_2': 100.0},
'rep_gene': 'AT1G24040'},
{'genes': {'AT1G24020': 100.0,
'AT1G2402_1': 100.0},
'rep_gene': 'AT1G24020'},
{'genes': {'AT1G24010': 100.0},
'rep_gene': 'AT1G24010'},
{'genes': {'AT1G24000': 100.0},
'rep_gene': 'AT1G24000'}],
I want to print the information held by the key(?) 'rep_gene'. But it says rep_gene isn't a key. What is rep_gene and how can I make a dataframe holding the rep_gene information?
EDIT
The first 2 lines work but the final one returns: AttributeError: 'list' object has no attribute 'get'
clus1 = (gene_clusters.get("cluster-1"))
cdhit1 = (clus1.get("cdhitclusters"))
cdhit1.get("rep_gene")
Upvotes: 0
Views: 31
Reputation: 3001
Here is a way using Counter
from the built-in collections
package:
# use list-of-dict from above
cd_hit_clusters = [
{'genes': {'AT1G24070': 100.0}, 'rep_gene': 'AT1G24070'},
{'genes': {'AT1G24100': 100.0}, 'rep_gene': 'AT1G24100'},
{'genes': {'AT1G24040': 100.0, 'AT1G2404_1': 100.0, 'AT1G2404_2': 100.0}, 'rep_gene': 'AT1G24040'},
{'genes': {'AT1G24020': 100.0, 'AT1G2402_1': 100.0}, 'rep_gene': 'AT1G24020'},
{'genes': {'AT1G24010': 100.0}, 'rep_gene': 'AT1G24010'},
{'genes': {'AT1G24000': 100.0}, 'rep_gene': 'AT1G24000'}
]
Now use Counter
:
from collections import Counter
rep_gene_list = [ cd['rep_gene'] for cd in cd_hit_clusters ]
Counter(rep_gene_list)
# results
Counter({'AT1G24070': 1,
'AT1G24100': 1,
'AT1G24040': 1,
'AT1G24020': 1,
'AT1G24010': 1,
'AT1G24000': 1})
Upvotes: 1
Reputation: 3466
Your cdhit1
contains the following list:
[
{
"genes":{
"AT1G24070":100.0
},
"rep_gene":"AT1G24070"
},
{
"genes":{
"AT1G24100":100.0
},
"rep_gene":"AT1G24100"
},
{
"genes":{
"AT1G24040":100.0,
"AT1G2404_1":100.0,
"AT1G2404_2":100.0
},
"rep_gene":"AT1G24040"
},
{
"genes":{
"AT1G24020":100.0,
"AT1G2402_1":100.0
},
"rep_gene":"AT1G24020"
},
{
"genes":{
"AT1G24010":100.0
},
"rep_gene":"AT1G24010"
},
{
"genes":{
"AT1G24000":100.0
},
"rep_gene":"AT1G24000"
}
]
So you need to specify which index you want to use. I have never worked with pandas, but maybe try cdhit1[0]
and see what it returns. As you may notice you have multiple elements with "rep_gene"
as key.
Upvotes: 1