Reputation: 68
I have a large data set with over 350,000 rows. Now one of the columns contains a single value dictionary as its row values, and I was to assign each unique key as a new column in the data frame and the value as the row value in the right [row, column] position.
Here's the code I was hoping to use but due to a large number of rows it's taking too long.
idx = 0
for row in df['value']:
for key in row:
if key not in df.columns.tolist():
df[key] = 0
df.loc[idx,key] = row[key]
else:
df.loc[idx,key] = row[key]
idx += 1
Here's the sample data
import pandas as pd
df = pd.DataFrame({'time': [12,342,786],
'event': ['offer received', 'transaction', 'offer viewed'],
'value': [{'offer id': '2906b810c7d4411798c6938adc9daaa5'}, {'amount': 0.35000000000000003},
{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}]
})
Here the expected output:
df2 = pd.DataFrame({'time': [12,342,786],
'event': ['offer received', 'transaction', 'offer viewed'],
'value': [{'offer id': '2906b810c7d4411798c6938adc9daaa5'}, {'amount': 0.35000000000000003},
{'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}],
'offer id': ['2906b810c7d4411798c6938adc9daaa5', 0, '0b1e1539f2cc45b7b9fa7c272da2e1d7' ],
'amount': [0, 0.35000000000000003, 0]
})
Upvotes: 0
Views: 48
Reputation: 9018
df["offer id"] = df["value"].apply(lambda d: d["offer id"] if "offer id" in d else 0)
df["amount"] = df["value"].apply(lambda d: d["amount"] if "amount" in d else 0)
Output:
time event value \
0 12 offer received {'offer id': '2906b810c7d4411798c6938adc9daaa5'}
1 342 transaction {'amount': 0.35000000000000003}
2 786 offer viewed {'offer id': '0b1e1539f2cc45b7b9fa7c272da2e1d7'}
offer id amount
0 2906b810c7d4411798c6938adc9daaa5 0.00
1 0 0.35
2 0b1e1539f2cc45b7b9fa7c272da2e1d7 0.00
Upvotes: 1