Reputation: 48466
I have a list of Python dict
s each with the same keys,
dict_keys= ['k1','k2','k3','k4','k5','k6'] # More like 30 keys in practice
data = []
for i in range(20): # More like 3000 in practice
data.append({k: np.random.randint(100) for k in dict_keys})
and would like to use it to create a corresponding Pandas dataframe with a subset of the keys. My current approach is to take each dict
from the list one at a time and append it to the dataframe using
df = pd.DataFrame(columns=['k1','k2','k5','k6'])
for d in data:
df = df.append({k: d[k] for k in list(df.columns)}, ignore_index=True)
# In practice, there are some calculations on some of the values here
but this is very slow (the actual list, and the dicts it contains, are both quite large).
Is there a better, faster (and more idiomatic) method for iterating through a list of dictionaries and adding them as rows to a Pandas dataframe?
Upvotes: 8
Views: 5314
Reputation: 64318
Simply pass data
to DataFrame
's __init__
, or to DataFrame.from_records
(either would work).
You might also want to set an index, e.g. DataFrame.from_records(data, index = 'k1')
.
If you need to also perform some calculations, it's usually easier and more convenient to do it on the DataFrame
, after creating it. Leverage pandas!
Upvotes: 15