Reputation: 50
How can I create a dataframe from a list of dictionaries that contain list of rows for each key? Please check example below:
>>> import pandas as pd
>>> rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
>>> rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
>>> rec_set_all = [rec_set1, rec_set2]
>>> df = pd.DataFrame.from_records(rec_set1)
>>> df
col1 col2 col3
0 1 5 x
1 2 3 y
2 3 4 z
All good so far.
Now I try to append rec_set2 and this is what happens:
>>> df = df.append(rec_set2, ignore_index=True)
>>> df
col1 col2 col3
0 1 5 x
1 2 3 y
2 3 4 z
3 [5, 6, 7] [-4, 6, 2] [p, q, r]
Not what I was expecting. What append function should I use ?
And rather than doing it in a loop, is there a simple one-line way to create the entire dataframe from rec_set_all
?
Upvotes: 1
Views: 191
Reputation: 121
Assuming you are starting out with a list of dictionaries of lists, you can start by using list comprehension to turn it into a list of DataFrames:
rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
... (etc.)
rec_setn = {...}
rec_set_all = [rec_set1, rec_set2,...,rec_setn]
df_list = [pd.DataFrame(r) for r in rec_set_all]
Next, you can use the simple pd.concat
method do combine it all into one DataFrame:
df_all = pd.concat(df_list)
If you want to reset the indexes so that it is coninuous rather than 0,1,2,0,1,2,etc., you can use this to renumber them all from 0:
df.reset_index(inplace=True,drop=True)
The result from your example would be:
col1 col2 col3
0 1 5 x
1 2 3 y
2 3 4 z
3 5 -4 p
4 6 6 q
5 7 2 r
Including info from the comment from AMC, it can be written as a one-liner:
df = pd.concat([pd.DataFrame(r) for r in rec_set_all], ignore_index = True)
Upvotes: 2
Reputation: 338
This will also work. Just append the new dict as a DataFrame.
rec_set1 = {'col1': [1,2,3], 'col2': [5,3,4], 'col3': ['x','y','z']}
rec_set2 = {'col1': [5,6,7], 'col2': [-4,6,2], 'col3': ['p','q','r']}
rec_set_all = [rec_set1, rec_set2]
df = pd.DataFrame(rec_set1)
# append as rec_set2 as a DataFrame
df.append(pd.DataFrame(rec_set2))
Upvotes: 0