Reputation: 170
I have been through several posts, however, I am unable to sort out how to use each dictionary within a list of dictionaries to create a rows in a pandas Dataframe. Specifically, I have two issues that my limited experience with dictionaries is unable to workaround.
Example List of Dictionaries (>500k in total):
pep_list=[{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',
'gene': 'HV404',
'aa_comp': {'W': 4,
'V': 5,
'L': 5,
'S': 10,
'Q': 3,
'E': 1,
'G': 5,
'P': 2,
'K': 1,
'T': 2,
'C': 1,
'A': 1,
'I': 1,
'N': 1,
'R': 1},
'peptide': ['WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR'],
'Length': 43,
'z': 3,
'Mass': 4557,
'm/z': 1519.0},
{'A0A0G2JNQ3': 'ISGNTSR',
'gene': 'A0A0G2JNQ3',
'aa_comp': {'I': 1, 'S': 2, 'G': 1, 'N': 1, 'T': 1, 'R': 1},
'peptide': ['ISGNTSR'],
'Length': 7,
'z': 2,
'Mass': 715,
'm/z': 357.5},etc.]
Expected output:
Dataframe = pd.DataFrame({values from dictionaries}, columns=["id", "gene", 'aa_comp', 'peptide', 'length', 'z', 'mass','m/z')
id | columns of keys |
---|---|
dictionary 1 | values in seperate columns |
dictionary 2 | values in seperate columns |
Thank you for any insight!
Upvotes: 0
Views: 2309
Reputation: 1275
Whatever these things are
{'HV404': 'WVLSQVQLQESGPGLVKPSGTLSLTCAVSGGSISSSNWWSWVR',}
{'A0A0G2JNQ3': 'ISGNTSR',}
are messing it up, plus it doesn't look like they are needed because the info is repeated.
If you want to take out a non-representative key you can do something like this
key_intersect = set(pep_list[0].keys()).intersection(set(pep_list[1].keys()))
new_list_of_dictionaries = [{key:value for (key,value) in dicts.items() if key in key_intersect} for dicts in pep_list]
df = pd.DataFrame(new_list_of_dictionaries)
Pretty compact code, but you could unfurl it in loops if needed. Beware of blindly taking out the first element, unless it is an ordered dict the first element is not guaranteed to be the same.
Upvotes: 2