Reputation: 1735
If I have the following type of data - a list of dictionaries, how can I extract some key values from it?
comps = [
{
"name":'Test1',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test2',
"p_value":0.05,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test3',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test4',
"p_value":0.07,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test5',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test6',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test7',
"p_value":0.01,
"group0_null": 0.0,
"group1_null": 0.0,
}]
Result
From the data above, let's say I only want name
and p_value
. How can I get this result.
[{
"name":'Test1',
"p_value":0.02,
},{
"name":'Test2',
"p_value":0.05,
},{
"name":'Test3',
"p_value":0.03,
},{
"name":'Test4',
"p_value":0.07,
},{
"name":'Test5',
"p_value":0.03,
},{
"name":'Test6',
"p_value":0.02,
},{
"name":'Test7',
"p_value":0.01,
}]
this shows everything
[c for c in comps]
This shows only the names [c['name'] for c in comps]
But if I do this:
[c['name','p_value'] for c in comps ]
I get the error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-94-b29459f7b089> in <module>
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
<ipython-input-94-b29459f7b089> in <listcomp>(.0)
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
KeyError: ('name', 'p_value')
The real data dictionary is much larger than this. I want to do this so that I can have a list of things that are need.
UPDATE
Since some pointed out that the structure of the data that I showed is different from what I receive from the server, here's the code that I used to pull the data.
# get all comparisons
comps = source.get_comparison(name='Pr1 vs. Rest')
# only take the continuous explainers
comps['continuous_explainers'][1:5]
DATA
[{'name': 'Gender',
'column_index': 2,
'ks_score': 0.0022329709328575142,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 2.0], [0.0, 0.0, 1.0, 1.0, 2.0]],
't_test_p_value': 0.8341377317414621,
'diff_means': 0.0014959875249118681,
'primary_group_mean': 0.6312769010043023,
'secondary_group_mean': 0.6297809134793905,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Gender_Missing_color',
'column_index': 3,
'ks_score': 2.220446049250313e-16,
'p_value': 1.0,
'quartiles': [[1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0]],
't_test_p_value': 1.0,
'diff_means': 0.0,
'primary_group_mean': 1.0,
'secondary_group_mean': 1.0,
'ks_sign': '0',
'group0_percent_null': 0.9966523194643712,
'group1_percent_null': 0.9959153360564427},
{'name': 'Gender_Missing',
'column_index': 4,
'ks_score': 0.0007369834078797544,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0]],
't_test_p_value': 0.40301091478187256,
'diff_means': -0.0007369834079284866,
'primary_group_mean': 0.0033476805356288893,
'secondary_group_mean': 0.004084663943557376,
'ks_sign': '-',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Male',
'column_index': 5,
'ks_score': 0.0029699543407862294,
'p_value': 0.9999999999915384,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0, 1.0]],
't_test_p_value': 0.6740956861786738,
'diff_means': 0.0029699543407684104,
'primary_group_mean': 0.6245815399330444,
'secondary_group_mean': 0.621611585592276,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0}]
This is the output I get. As mentioned above, I only need some data from this list of dictionaries.
Upvotes: 0
Views: 106
Reputation: 1735
I'm still not sure how to make the answers above work for me. However, I figured another way to do this:
test = [(c['name'],c['p_value'], c['group0_percent_null']) for c in comps]
pd.DataFrame(test)
0 1 2
0 ID 5.374590e-13 0.000000
1 Gender 1.000000e+00 0.000000
2 Gender_Missing_color 1.000000e+00 0.996652
3 Gender_Missing 1.000000e+00 0.000000
4 Male 1.000000e+00 0.000000
... ... ... ...
It gave me the result I was looking for.
Upvotes: 1
Reputation: 362
You could create a new dict for each object in comparisons
, and initialize it only with name
and p_value
keys.
ex = [{'name': d['name'], 'p_value': d['p_value']} for d in comparisons]
Upvotes: 2