M_S_N
M_S_N

Reputation: 2810

pandas:make new columns from a list stored as column value

I have a Dataframe that has a column value that looks like the following:

[
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:48 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "A"
    },
    {
      "Event" : "B",
      "TimeStamp" : "2019-09-24 10:17:38 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:35 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "D"
    },
    {
      "Event" : "V",
      "TimeStamp" : "2019-09-24 10:17:33 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "OrderID" : "0"
    },
    {
      "OrderID" : "0",
      "TimeStamp" : "2019-09-24 10:17:32 +0000",
      "Screen" : "Home_Screen",
      "StateVars" : "",
      "Event" : "C"
    }
  ]

I want make columns of all the keys. So, original data frame looks like this:


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |  a   |  b   |  c   |  d   |  e   |  f  |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         15  | NaN  | NaN  | NaN  | NaN  | NaN  | NaN |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+------+------+------+------+-----+

And I am looking for something like this


+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
|    | O          | v           | S       |               I                       |                     EventLog                       | CustomerID  |OrdeID|  TimeStamp                 |Screen        | StarsVar   |Event |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+
| 0  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | A    |
| 1  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | B    |
| 2  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | C    |
| 3  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | D    |
| 4  |      1     |        0.4  |  OS     | 92D42D7E-68F0-4688-83C5-781920E05335  | [{'OrderID': '0', 'TimeStamp': '2019-09-24 10:...  |         1   | 0    | 2019-09-24 10:17:33 +0000  | Home_Screen  | NaN        | E    |
+----+------------+-------------+---------+---------------------------------------+----------------------------------------------------+-------------+------+----------------------------+--------------+------------+------+

don't necessarily need to delete columns as shown in above output.

Upvotes: 3

Views: 567

Answers (1)

jezrael
jezrael

Reputation: 862406

Create DataFrame by constructor first:

df1 = pd.DataFrame(df['EventLog'].values.tolist())
print (df1)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C

And for add to original:

df = df.join(df1)
print (df)

EDIT: I think there are some missing values, so solution is replace them to empty dict - finally it create missing values:

print (df)
                                            EventLog
0  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
1  {'Event': 'B', 'TimeStamp': '2019-09-24 10:17:...
2  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
3  {'Event': 'V', 'TimeStamp': '2019-09-24 10:17:...
4  {'OrderID': '0', 'TimeStamp': '2019-09-24 10:1...
5                                                NaN

df = pd.DataFrame([x if x ==x else {} for x in df['EventLog']])
print (df)
  OrderID                  TimeStamp       Screen StateVars Event
0       0  2019-09-24 10:17:48 +0000  Home_Screen               A
1       0  2019-09-24 10:17:38 +0000  Home_Screen               B
2       0  2019-09-24 10:17:35 +0000  Home_Screen               D
3       0  2019-09-24 10:17:33 +0000  Home_Screen               V
4       0  2019-09-24 10:17:32 +0000  Home_Screen               C
5     NaN                        NaN          NaN       NaN   NaN

Another Solution:

a=df['EventLog'].values.tolist()
a = [x for x in a if x == x]
empty_df=pd.DataFrame()
for i in range(0, len(a)):
    b=a[i]
    for j in range(0, len(b)):
        c=b[j]
        empty_df=empty_df.append(c, ignore_index=True, sort=False)
df = df.join(empty_df)

Upvotes: 4

Related Questions