Reputation: 477
I have a dataframe that looks like this:
>> df
A
0 [{k1:v1, k2:v2}, {k1:v3, k2:v4}]
1 [{k1:v5, k2:v6}, {k1:v7, k2:v8}, {k1:v9, k2:v10}]
that is column A is a list of dicts with same keys
and I want to extract the values corresponding to the first dict in those lists:
K1 K2 A
0 v1 v2 ...
1 v5 v6 ...
my solution so far works but is particularly slow (> 1min for ~50K records):
def extract_first_dict(s):
s['K1'] = s['A'][0]['k1']
s['K2'] = s['A'][0]['k2']
return s
df = df.apply(extract_first_dict, axis = 1)
Anybody could suggest a better, faster way to do this? Thanks!
Upvotes: 4
Views: 3563
Reputation: 402483
concat
pd.concat([pd.DataFrame(df.A.str[0].tolist(), index=df.index), df], axis=1)
k1 k2 A
0 v1 v2 [{'k1': 'v1', 'k2': 'v2'}, {'k1': 'v3', 'k2': ...
1 v5 v6 [{'k1': 'v5', 'k2': 'v6'}, {'k1': 'v7', 'k2': ...
Upvotes: 2
Reputation: 294258
df.A.str[0].apply(pd.Series)
k1 k2
0 v1 v2
1 v5 v6
with join
df.A.str[0].apply(pd.Series).join(df)
k1 k2 A
0 v1 v2 [{'k1': 'v1', 'k2': 'v2'}, {'k1': 'v3', 'k2': ...
1 v5 v6 [{'k1': 'v5', 'k2': 'v6'}, {'k1': 'v7', 'k2': ...
pd.DataFrame([t[0] for t in df.A], df.index)
k1 k2
0 v1 v2
1 v5 v6
with join
pd.DataFrame([t[0] for t in df.A], df.index).join(df)
k1 k2 A
0 v1 v2 [{'k1': 'v1', 'k2': 'v2'}, {'k1': 'v3', 'k2': ...
1 v5 v6 [{'k1': 'v5', 'k2': 'v6'}, {'k1': 'v7', 'k2': ...
Upvotes: 1
Reputation: 164673
Option 1
You should find pd.Series.apply
more efficient than pd.DataFrame.apply
, as you are using only one series as an input.
def extract_first(x):
return list(x[0].values())
df['B'] = df['A'].apply(extract_first)
Option 2
You can also try using a list comprehension:
df['B'] = [list(x[0].values()) for x in df['A']]
In both the above cases, you can split into 2 columns via:
df[['C', 'D']] = df['B'].apply(pd.Series)
You should benchmark with your data to assess whether either of these options are fast enough for your use case.
But really...
Look upstream to get your data in a more usable format. pandas
will offer no vectorised functionality on a series of dictionaries. You should consider using just a list of dictionaries.
Upvotes: 4