user1828605
user1828605

Reputation: 1735

How to extract values from specific keys from list of dictionaries in python?

If I have the following type of data - a list of dictionaries, how can I extract some key values from it?

comps = [
{
    "name":'Test1',
    "p_value":0.02,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test2',
    "p_value":0.05,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test3',
    "p_value":0.03,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test4',
    "p_value":0.07,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test5',
    "p_value":0.03,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test6',
    "p_value":0.02,
    "group0_null": 0.0,
    "group1_null": 0.0,
},{
    "name":'Test7',
    "p_value":0.01,
    "group0_null": 0.0,
    "group1_null": 0.0,
}]

Result

From the data above, let's say I only want name and p_value. How can I get this result.

[{
    "name":'Test1',
    "p_value":0.02,
},{
    "name":'Test2',
    "p_value":0.05,
},{
    "name":'Test3',
    "p_value":0.03,
},{
    "name":'Test4',
    "p_value":0.07,
},{
    "name":'Test5',
    "p_value":0.03,
},{
    "name":'Test6',
    "p_value":0.02,
},{
    "name":'Test7',
    "p_value":0.01,
}]

this shows everything

[c for c in comps]

This shows only the names [c['name'] for c in comps]

But if I do this:

[c['name','p_value'] for c in comps ]

I get the error:

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-94-b29459f7b089> in <module>
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
      2 
      3 # cont_comps = []
      4 
      5 # for c in comps['continuous_explainers']:

<ipython-input-94-b29459f7b089> in <listcomp>(.0)
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
      2 
      3 # cont_comps = []
      4 
      5 # for c in comps['continuous_explainers']:

KeyError: ('name', 'p_value')

The real data dictionary is much larger than this. I want to do this so that I can have a list of things that are need.

UPDATE

Since some pointed out that the structure of the data that I showed is different from what I receive from the server, here's the code that I used to pull the data.

# get all comparisons
comps = source.get_comparison(name='Pr1 vs. Rest')

# only take the continuous explainers 
comps['continuous_explainers'][1:5]

DATA

[{'name': 'Gender',
  'column_index': 2,
  'ks_score': 0.0022329709328575142,
  'p_value': 1.0,
  'quartiles': [[0.0, 0.0, 1.0, 1.0, 2.0], [0.0, 0.0, 1.0, 1.0, 2.0]],
  't_test_p_value': 0.8341377317414621,
  'diff_means': 0.0014959875249118681,
  'primary_group_mean': 0.6312769010043023,
  'secondary_group_mean': 0.6297809134793905,
  'ks_sign': '+',
  'group0_percent_null': 0.0,
  'group1_percent_null': 0.0},
 {'name': 'Gender_Missing_color',
  'column_index': 3,
  'ks_score': 2.220446049250313e-16,
  'p_value': 1.0,
  'quartiles': [[1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0]],
  't_test_p_value': 1.0,
  'diff_means': 0.0,
  'primary_group_mean': 1.0,
  'secondary_group_mean': 1.0,
  'ks_sign': '0',
  'group0_percent_null': 0.9966523194643712,
  'group1_percent_null': 0.9959153360564427},
 {'name': 'Gender_Missing',
  'column_index': 4,
  'ks_score': 0.0007369834078797544,
  'p_value': 1.0,
  'quartiles': [[0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0]],
  't_test_p_value': 0.40301091478187256,
  'diff_means': -0.0007369834079284866,
  'primary_group_mean': 0.0033476805356288893,
  'secondary_group_mean': 0.004084663943557376,
  'ks_sign': '-',
  'group0_percent_null': 0.0,
  'group1_percent_null': 0.0},
 {'name': 'Male',
  'column_index': 5,
  'ks_score': 0.0029699543407862294,
  'p_value': 0.9999999999915384,
  'quartiles': [[0.0, 0.0, 1.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0, 1.0]],
  't_test_p_value': 0.6740956861786738,
  'diff_means': 0.0029699543407684104,
  'primary_group_mean': 0.6245815399330444,
  'secondary_group_mean': 0.621611585592276,
  'ks_sign': '+',
  'group0_percent_null': 0.0,
  'group1_percent_null': 0.0}]

This is the output I get. As mentioned above, I only need some data from this list of dictionaries.

Upvotes: 0

Views: 106

Answers (3)

user1828605
user1828605

Reputation: 1735

I'm still not sure how to make the answers above work for me. However, I figured another way to do this:

test = [(c['name'],c['p_value'], c['group0_percent_null']) for c in comps]
pd.DataFrame(test)

    0   1   2
0   ID  5.374590e-13    0.000000
1   Gender  1.000000e+00    0.000000
2   Gender_Missing_color    1.000000e+00    0.996652
3   Gender_Missing  1.000000e+00    0.000000
4   Male    1.000000e+00    0.000000
... ... ... ...

It gave me the result I was looking for.

Upvotes: 1

Hadrian
Hadrian

Reputation: 927

try

[{'name':c['name'], 'p_value':c['p_value']} for c in comps]

Upvotes: -1

Ori David
Ori David

Reputation: 362

You could create a new dict for each object in comparisons, and initialize it only with name and p_value keys.

ex = [{'name': d['name'], 'p_value': d['p_value']} for d in comparisons]

Upvotes: 2

Related Questions