How to transform list of tuples of dicts to a pandas dataframe

Question

Problem Statement

I have a list of tuples of dicts: [(A, B), (A, B),...]. I wrote A and B for the dictionaries because the keys are the same across these "types".

I want a dataframe with some keys from A and some keys from B.

Some of the keys in A are also present in B. I'd like to keep the keys from A.

Ways of approaching it:

I can think of a couple ways, and I'm curious which will be more performant. I've listed them in the order of my best guess as to performance:

A list comprehension, building new dictionaries (or extending A with parts of B) and then pd.DataFrame.from_records.
pd.DataFrame.from_records has an exclude parameter. Merge the larger dicts first and then exclude columns when building the dataframe.
Transpose the list of tuples (maybe zip(*)?), create two dataframes with .from_records, one for each A and B, remove unnecessary columns from each, and then glue the resulting dataframes together side by side.
Make each dict (row) a dataframe and then glue them on top of one another vertically (append or concat or something).

As a complete newbie to pandas, it seems to difficult to tell what each operation is, and when it's building a view or doing a copy, so I can't tell what is expensive and what isn't.

Am I missing an approach to this?
Are my solutions in the correct order of performance?
If instead of dictionaries, A and B were dataframes, would concatenating them be faster? How much memory overhead does a dataframe have, and is it ever common practice to have a one-row dataframe?

Specifics:

Here's some simplified example data,

[({"chrom": "chr1", "gStart": 1000, "gEnd": 2000, "other": "drop this"}, 
  {"chrom": "chr1": "pStart": 1500, "pEnd": 2500, "drop": "this"}), 
 ({"chrom": "chr2", "gStart": 8000, "gEnd": 8500, "other": "unimportant"}, 
  {"chrom": "chr2": "pStart": 7500, "pEnd": 9500, "drop": "me"}) ]

The result I'd like I think would be the outcome of:

 pd.DataFrame.from_records([ 
  {"chrom": "chr1", "gStart": 1000, "gEnd": 2000, "pStart": 1500, "pEnd": 2500}, 
  {"chrom": "chr2", "gStart": 8000, "gEnd": 8500, "pStart": 7500, "pEnd": 9500}  ] )

Pseudocode of the solution I'd like:

I think this would work if dictionaries had a nice, in-place select method:

A_fields = [...]
B_fields = [...]
A_B_merged = [a.select(A_fields).extend(b.select(B_fields)) for a, b in A_B_not_merged]

A_B_dataframe = pd.DataFrame.from_records(A_B_merged)

kuriouscoder · Accepted Answer

Using plain old dictionary merge that merges the start and end dictionary via pythonic (Python 3.5+) way and then uses from_records to construct DataFrame.

pd.DataFrame.from_records([{**d[0],**d[1]} for d in k])

  chrom  gEnd  gStart  pEnd  pStart
0  chr1  2000    1000  2500    1500
1  chr2  8500    8000  9500    7500

How to transform list of tuples of dicts to a pandas dataframe

Problem Statement

Ways of approaching it:

Specifics:

Pseudocode of the solution I'd like:

Answers (2)

Related Questions