thejahcoop
thejahcoop

Reputation: 170

Creating a Pandas DataFrame from list of dictionaries? Each dictionary as row in DataFrame?

I have been through several posts, however, I am unable to sort out how to use each dictionary within a list of dictionaries to create a rows in a pandas Dataframe. Specifically, I have two issues that my limited experience with dictionaries is unable to workaround.

  1. So far I have separated each key and value into two columns however, what I am looking for is to create a row for each dictionary and use the key as the column name.
  2. Only the first key in each dictionary is unique, thus I would either like to drop it completely or only use the key as a value to populate a column under the name "id".

Example List of Dictionaries (>500k in total):

pep_list=[{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',
          'gene': 'HV404',
          'aa_comp': {'W': 4,
       'V': 5,
       'L': 5,
       'S': 10,
       'Q': 3,
       'E': 1,
       'G': 5,
       'P': 2,
       'K': 1,
       'T': 2,
       'C': 1,
       'A': 1,
       'I': 1,
       'N': 1,
       'R': 1},
      'peptide': ['WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR'],
      'Length': 43,
      'z': 3,
      'Mass': 4557,
      'm/z': 1519.0}, 
    {'A0A0G2JNQ3': 'ISGNTSR',
          'gene': 'A0A0G2JNQ3',
          'aa_comp': {'I': 1, 'S': 2, 'G': 1, 'N': 1, 'T': 1, 'R': 1},
          'peptide': ['ISGNTSR'],
          'Length': 7,
          'z': 2,
          'Mass': 715,
          'm/z': 357.5},etc.]

Expected output:

Dataframe = pd.DataFrame({values from dictionaries}, columns=["id", "gene", 'aa_comp', 'peptide', 'length', 'z', 'mass','m/z')
id columns of keys
dictionary 1 values in seperate columns
dictionary 2 values in seperate columns

Thank you for any insight!

Upvotes: 0

Views: 2309

Answers (1)

Andrew Holmgren
Andrew Holmgren

Reputation: 1275

Whatever these things are

{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',}
{'A0A0G2JNQ3': 'ISGNTSR',}

are messing it up, plus it doesn't look like they are needed because the info is repeated.

If you want to take out a non-representative key you can do something like this

key_intersect = set(pep_list[0].keys()).intersection(set(pep_list[1].keys()))
new_list_of_dictionaries = [{key:value for (key,value) in dicts.items() if key in key_intersect} for dicts in pep_list]
df = pd.DataFrame(new_list_of_dictionaries)

Pretty compact code, but you could unfurl it in loops if needed. Beware of blindly taking out the first element, unless it is an ordered dict the first element is not guaranteed to be the same.

Upvotes: 2

Related Questions