Reputation: 283
There is a dataframe like this:
id product day
0 8 22 23
1 8 32 24
2 8 70 23
3 8 141 23
4 8 160 24
... ... ... ...
210794 71564 7325 23
210795 71564 8528 23
210796 71564 8596 23
210797 71564 8622 23
210798 71564 8636 23
And the following information:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110205 entries, 0 to 110204
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 110205 non-null int64
1 product 110205 non-null int64
2 day 110205 non-null int64
dtypes: int64(3)
memory usage: 4.8 MB
The details of this dataframe is important because using some series and list commands do not work well on this dataframe.
What I would like to have is a new dataframe that: 1- combine all the products based on their day in one category.
For example: From this:
+---+----+---------+-----+
| | id | product | day |
+---+----+---------+-----+
| 0 | 8 | 22 | 23 |
+---+----+---------+-----+
| 1 | 8 | 23 | 24 |
+---+----+---------+-----+
| 2 | 8 | 70 | 23 |
+---+----+---------+-----+
| 3 | 8 | 141 | 23 |
+---+----+---------+-----+
| 4 | 8 | 160 | 24 |
+---+----+---------+-----+
So firstly we have this:
+---+----+---------+-----+---------------+
| | id | product | day | (first step) |
+---+----+---------+-----+---------------+
| 0 | 8 | 22 | 23 | [22, 70, 141] |
+---+----+---------+-----+---------------+
| 1 | 8 | 23 | 24 | [23, 160] |
+---+----+---------+-----+---------------+
2- secondly make all this categories in one column for a user in a new dataframe.
+-----+-----+----------------------------+
| | id | (second step) |
+-----+-----+----------------------------+
| 0 | 8 | [[22, 70, 141], [23, 160]] |
+-----+-----+----------------------------+
| ... | ... | ... |
+-----+-----+----------------------------+
And so on makes for other users in the list.
It is important that finally this should be output to a json file so the below error needs to be considered:
"TypeError: Object of type 'DataFrame' is not JSON serializable"
A clear, simple and easy to understand coding would be highly appreciateted. Thank you.
By the way, I tried the below code that gives horrible output:
Errory code:
df['final'] = df.set_index('id').values.tolist()
df = df.groupby('id')['final'].apply(list)
Upvotes: 0
Views: 190
Reputation: 2514
What about this :
df2 = df.groupby(['id', 'day']).product.apply(list).reset_index(drop=False)
df3 = df2.groupby('id').product.apply(list).reset_index(drop=False)
I'm assuming here you made a typo describing your step #1 output. It doesn't seem to make any sense (yet ?) to keep the "product" column...
Edit :
You can convert your dataframe to a json object using pandas' to_json method.
For instance :
df2.to_json()
df3.to_json(orient="records")
Upvotes: 1