Reputation: 2810
I have a Dataframe that has a column value that looks like the following:
[
{
"OrderID" : "0",
"TimeStamp" : "2019-09-24 10:17:48 +0000",
"Screen" : "Home_Screen",
"StateVars" : "",
"Event" : "A"
},
{
"Event" : "B",
"TimeStamp" : "2019-09-24 10:17:38 +0000",
"Screen" : "Home_Screen",
"StateVars" : "",
"OrderID" : "0"
},
{
"OrderID" : "0",
"TimeStamp" : "2019-09-24 10:17:35 +0000",
"Screen" : "Home_Screen",
"StateVars" : "",
"Event" : "D"
},
{
"Event" : "V",
"TimeStamp" : "2019-09-24 10:17:33 +0000",
"Screen" : "Home_Screen",
"StateVars" : "",
"OrderID" : "0"
},
{
"OrderID" : "0",
"TimeStamp" : "2019-09-24 10:17:32 +0000",
"Screen" : "Home_Screen",
"StateVars" : "",
"Event" : "C"
}
]
I want make columns of all the keys. So, original data frame looks like this:
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
| | O | v | S | I | EventLog | CustomerID | a | b | c | d | e | f |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
| 0 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 1 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 2 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 3 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | NaN | NaN | NaN | NaN | NaN | NaN |
| 4 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 15 | NaN | NaN | NaN | NaN | NaN | NaN |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
And I am looking for something like this
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
| | O | v | S | I | EventLog | CustomerID |OrdeID| TimeStamp |Screen | StarsVar |Event |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
| 0 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | 0 | 2019-09-24 10:17:33 +0000 | Home_Screen | NaN | A |
| 1 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | 0 | 2019-09-24 10:17:33 +0000 | Home_Screen | NaN | B |
| 2 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | 0 | 2019-09-24 10:17:33 +0000 | Home_Screen | NaN | C |
| 3 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | 0 | 2019-09-24 10:17:33 +0000 | Home_Screen | NaN | D |
| 4 | 1 | 0.4 | OS | 92D42D7E-68F0-4688-83C5-781920E05335 | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:... | 1 | 0 | 2019-09-24 10:17:33 +0000 | Home_Screen | NaN | E |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
don't necessarily need to delete columns as shown in above output.
Upvotes: 3
Views: 567
Reputation: 862406
Create DataFrame
by constructor first:
df1 = pd.DataFrame(df['EventLog'].values.tolist())
print (df1)
OrderID TimeStamp Screen StateVars Event
0 0 2019-09-24 10:17:48 +0000 Home_Screen A
1 0 2019-09-24 10:17:38 +0000 Home_Screen B
2 0 2019-09-24 10:17:35 +0000 Home_Screen D
3 0 2019-09-24 10:17:33 +0000 Home_Screen V
4 0 2019-09-24 10:17:32 +0000 Home_Screen C
And for add to original:
df = df.join(df1)
print (df)
EDIT: I think there are some missing values, so solution is replace them to empty dict - finally it create missing values:
print (df)
EventLog
0 {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
1 {'Event': 'B', 'TimeStamp': '2019-09-24 10:17:...
2 {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
3 {'Event': 'V', 'TimeStamp': '2019-09-24 10:17:...
4 {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
5 NaN
df = pd.DataFrame([x if x ==x else {} for x in df['EventLog']])
print (df)
OrderID TimeStamp Screen StateVars Event
0 0 2019-09-24 10:17:48 +0000 Home_Screen A
1 0 2019-09-24 10:17:38 +0000 Home_Screen B
2 0 2019-09-24 10:17:35 +0000 Home_Screen D
3 0 2019-09-24 10:17:33 +0000 Home_Screen V
4 0 2019-09-24 10:17:32 +0000 Home_Screen C
5 NaN NaN NaN NaN NaN
Another Solution:
a=df['EventLog'].values.tolist()
a = [x for x in a if x == x]
empty_df=pd.DataFrame()
for i in range(0, len(a)):
b=a[i]
for j in range(0, len(b)):
c=b[j]
empty_df=empty_df.append(c, ignore_index=True, sort=False)
df = df.join(empty_df)
Upvotes: 4