Reputation: 14094
My dataframe
Items Count ScannedCount
0 {'comp': {'S': '2019-08-02'}... 1032 1032
1 {'comp': {'S': '2019-08-27'}... 1032 1032
The items series looks like this
{'comp': {'S': '2019-08-02T16:54:55.035196+03:00'}, 'ID': {'S': '336'}, 'dID': {'S': '1763523'}, 'fname': {'S': '558012'}}
Using the second answer from this post allows me to convert the series to a dataframe. The issues is how to scale that operation since it happens on each row,
Current approach:
Looping through each row and concat them into a series (very slow)
item_df = pd.DataFrame(df['Items'].iloc[i]) for i in range(df.shape[0])]).reset_index(drop=True), df], axis=1)
Concat the results with the original dataframe
df = pd.concat([temp, df], axis=1)
I believe the for
loop in the first part is the bottleneck.
Is there a faster way to convert a series to dataframe and concat it back to the original dataframe.
Expected output:
comp ID dID fname Count ScannedCount
0 2019-08-02T16:54:55 336 1763523 548012 1032 1032
1 2019-09-01T14:52:24 336 1763523 528012 1032 1032
Upvotes: 3
Views: 790
Reputation: 1430
Try this more columnar solution. It works with assumptation that there is always 'S' key in dictionary.
df_tmp = df['Items'].apply(pd.Series)
for c in df_tmp.columns:
df[c] = df_tmp[c].apply(lambda x: x.get('S'))
df = df.drop(columns='Items')
Upvotes: 1
Reputation: 1126
test data:
import pandas as pd
test_data = {'item' : [{'comp': {'S': '2019-08-02T16:54:55.035196+03:00'}, 'ID': {'S': '336'}, 'dID': {'S': '1763523'}, 'fname': {'S': '558012'}}, {'comp': {'S': '2019-09-02T16:54:55.035196+03:00'}, 'ID': {'S': '336'}, 'dID': {'S': '1763523'}, 'fname': {'S': '558012'}}], 'Count': [1032,1032], 'ScannedCount':[1032,1032]}
df = pd.DataFrame.from_dict(test_data)
Out[64]:
item Count ScannedCount
0 {'comp': {'S': '2019-08-02T16:54:55.035196+03:... 1032 1032
1 {'comp': {'S': '2019-09-02T16:54:55.035196+03:... 1032 1032
As far as i understand, you try to get something like this:
def extract(row):
item_series = pd.Series({k:v for k,v in row['item'].items()})
result = row.append(item_series)
return result
df = df.apply(extract, axis = 1)
This gives you:
Out[67]:
comp ID dID \
0 {'S': '2019-08-02T16:54:55.035196+03:00'} {'S': '336'} {'S': '1763523'}
1 {'S': '2019-09-02T16:54:55.035196+03:00'} {'S': '336'} {'S': '1763523'}
fname
0 {'S': '558012'}
1 {'S': '558012'}
## skipped some columns for clarity
Upvotes: 0
Reputation: 1488
Looks like iterating over all the df is inevitable. I'm not sure if it's better, but I know the pandas way of doing so is by using iterrows()
.
In the docs they also mention itertuples()
for specific use cases, but again, I'm no expert here.
Hope this helps!
Upvotes: 0