dxb
dxb

Reputation: 283

Dataframe - Python the best and easy groupping rows method

There is a dataframe like this:

    id  product day
0   8   22  23
1   8   32  24
2   8   70  23
3   8   141 23
4   8   160 24
... ... ... ...
210794  71564   7325    23
210795  71564   8528    23
210796  71564   8596    23
210797  71564   8622    23
210798  71564   8636    23

And the following information:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 110205 entries, 0 to 110204
Data columns (total 3 columns):
 #   Column   Non-Null Count   Dtype
---  ------   --------------   -----
 0   id       110205 non-null  int64
 1   product  110205 non-null  int64
 2   day      110205 non-null  int64
dtypes: int64(3)
memory usage: 4.8 MB

The details of this dataframe is important because using some series and list commands do not work well on this dataframe.

What I would like to have is a new dataframe that: 1- combine all the products based on their day in one category.

For example: From this:

+---+----+---------+-----+
|   | id | product | day |
+---+----+---------+-----+
| 0 | 8  | 22      | 23  |
+---+----+---------+-----+
| 1 | 8  | 23      | 24  |
+---+----+---------+-----+
| 2 | 8  | 70      | 23  |
+---+----+---------+-----+
| 3 | 8  | 141     | 23  |
+---+----+---------+-----+
| 4 | 8  | 160     | 24  |
+---+----+---------+-----+

So firstly we have this:

+---+----+---------+-----+---------------+
|   | id | product | day | (first step)  |
+---+----+---------+-----+---------------+
| 0 | 8  | 22      | 23  | [22, 70, 141] |
+---+----+---------+-----+---------------+
| 1 | 8  | 23      | 24  | [23, 160]     |
+---+----+---------+-----+---------------+

2- secondly make all this categories in one column for a user in a new dataframe.

+-----+-----+----------------------------+
|     | id  | (second step)              |
+-----+-----+----------------------------+
| 0   | 8   | [[22, 70, 141], [23, 160]] |
+-----+-----+----------------------------+
| ... | ... | ...                        |
+-----+-----+----------------------------+

And so on makes for other users in the list.

It is important that finally this should be output to a json file so the below error needs to be considered:

"TypeError: Object of type 'DataFrame' is not JSON serializable"

A clear, simple and easy to understand coding would be highly appreciateted. Thank you.

By the way, I tried the below code that gives horrible output:

Errory code:
df['final'] = df.set_index('id').values.tolist()
df = df.groupby('id')['final'].apply(list)

Upvotes: 0

Views: 190

Answers (1)

tgrandje
tgrandje

Reputation: 2514

What about this :

df2 = df.groupby(['id', 'day']).product.apply(list).reset_index(drop=False)
df3 = df2.groupby('id').product.apply(list).reset_index(drop=False)

I'm assuming here you made a typo describing your step #1 output. It doesn't seem to make any sense (yet ?) to keep the "product" column...

Edit :

You can convert your dataframe to a json object using pandas' to_json method.

For instance :

df2.to_json()
df3.to_json(orient="records")

Upvotes: 1

Related Questions