mastersom
mastersom

Reputation: 545

How to convert a list of dictionaries containing equal length lists into a dataframe without using a for loop

I have list of dictionaries that consist of lists (see sample code below). What I would like to do is create a dataframe out of these without using a for loop. Any alternative faster way of doing this would be nice.

ls = [ dict[lists], dict[lists], ... ]

Initially I was just looping through the list of dictionaries and calling the dataframe object in a list comprehension and afterwards concatenating them. See code below. But this is quite slow for the amount of dictionaries I have.

temp_data_m1 = [{'x': np.random.rand(9).tolist(), 'y': np.random.rand(9).tolist(), 'z': np.random.rand(9).tolist()}]*50    
data_reshuffled1 = pd.concat([pd.DataFrame(dict_) for dict_ in temp_data_m1]).reset_index()

Is there a way to achieve this in a faster way perhaps without using a for loop?

Upvotes: 1

Views: 42

Answers (2)

Alexander
Alexander

Reputation: 109546

You could use a nested list comprehension inside a dictionary comprehension to first transform your original data. This assumes that each item in temp_data_m1 has the same dictionary keys.

# Sample data.
temp_data_m1 = [
    {'x': np.random.rand(3).tolist(), 
     'y': np.random.rand(3).tolist(), 
     'z': np.random.rand(3).tolist()}] * 2   

cols = temp_data_m1[0].keys()
df = pd.DataFrame(
    {col: [val for group in temp_data_m1 for val in group[col]] 
     for col in cols}
)
>>> df
          x         y         z
0  0.348319  0.404375  0.817278
1  0.887448  0.438613  0.368390
2  0.971582  0.533209  0.119674
3  0.348319  0.404375  0.817278
4  0.887448  0.438613  0.368390
5  0.971582  0.533209  0.119674

Timings

temp_data_m1 = [
    {'x': np.random.rand(3).tolist(), 
     'y': np.random.rand(3).tolist(), 
     'z': np.random.rand(3).tolist()}] * 20000

%%timeit 
cols = temp_data_m1[0].keys()
pd.DataFrame({col: [val for group in temp_data_m1 for val in group[col]] 
              for col in cols})
# output: 22.8 ms ± 849 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit pd.concat([pd.DataFrame(dict_) for dict_ in temp_data_m1]).reset_index(drop=True)
# output: 11.6 s ± 396 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Upvotes: 1

run-out
run-out

Reputation: 3184

I think you will still need to loop, but using native list and dictionaries eliminates the pd.DataFrame / concat overhead and will be significantly faster.

x_li = []
y_li = []
z_li = []

for l in ls: 
    x_li = x_li + l[0]['x']
    y_li = y_li + l[0]['y']
    z_li = z_li + l[0]['z']

dt = {'x': x_li, 'y': y_li, 'z': z_li}

df = pd.DataFrame(data=dt)

print(df)

           x         y         z
0   0.407243  0.064404  0.994289
1   0.778702  0.689556  0.246598
2   0.222480  0.236671  0.792531
3   0.114732  0.517506  0.901426
4   0.535884  0.138807  0.034585
5   0.621681  0.963316  0.628685
6   0.643132  0.994186  0.084340
7   0.167652  0.430170  0.344222
8   0.212579  0.649676  0.231918
9   0.704128  0.509263  0.047317
10  0.409379  0.939604  0.749458
11  0.029804  0.909334  0.520931
12  0.090505  0.834817  0.603464
13  0.837209  0.394173  0.877899
14  0.344467  0.602398  0.791664
15  0.077600  0.160189  0.237363
16  0.814201  0.104583  0.428033
17  0.899438  0.498138  0.855949
18  0.713373  0.732715  0.508276
19  0.211193  0.471923  0.526867
20  0.548586  0.136339  0.863532
21  0.041740  0.315708  0.116254
22  0.943269  0.056732  0.498985
23  0.085343  0.242628  0.039939
24  0.070387  0.114533  0.790064
25  0.568233  0.323008  0.811011
26  0.704781  0.221614  0.496521
27  0.089998  0.395631  0.703831
28  0.097087  0.012521  0.863149
29  0.731969  0.736039  0.147671
30  0.068417  0.117126  0.503902
31  0.487064  0.869781  0.677574
32  0.340297  0.633361  0.277859
33  0.141047  0.419666  0.193531
34  0.295001  0.845972  0.473824
35  0.217506  0.011523  0.717565
36  0.497627  0.059094  0.052230
37  0.658364  0.645356  0.712826
38  0.485345  0.600351  0.346634
39  0.395588  0.513874  0.797076
40  0.864188  0.786392  0.279711
41  0.979751  0.256491  0.305805
42  0.454343  0.954908  0.636447
43  0.279274  0.826389  0.891240
44  0.226816  0.222137  0.665129

Upvotes: 0

Related Questions