a1234
a1234

Reputation: 821

pandas group by value & create new data frames?

I have the following sample dataframe:

id      shcool_id   time_created
710     1045152     2019-07-26 15:10:26
5141    6853654     2020-10-07 11:32:30
2278    3460257     2019-11-01 17:31:11
3877    2186089     2020-02-14 14:53:43
3877    1841367     2020-02-14 14:53:43
2019    3266938     2019-11-01 12:40:35
4910    1608407     2020-09-21 15:47:40
3926    4480633     2020-02-14 16:07:04
3447    5416477     2020-01-17 13:13:36

I would like to group this dataframe by id so that I have several dataframes such as:

df1=id      shcool_id   time_created
      710     1045152     2019-07-26 15:10:26

df2=id      shcool_id   time_created
     5141    6853654     2020-10-07 11:32:30

df3=id      shcool_id   time_created
     2278    3460257     2019-11-01 17:31:11

df4=id      shcool_id   time_created
     3877    2186089     2020-02-14 14:53:43
     3877    1841367     2020-02-14 14:53:43

df5=id      shcool_id   time_created
    2019    3266938     2019-11-01 12:40:35

df6=id      shcool_id   time_created
    4910    1608407     2020-09-21 15:47:40

df7=id      shcool_id   time_created
    3926    4480633     2020-02-14 16:07:04

df8=id      shcool_id   time_created
     3447    5416477     2020-01-17 13:13:36

df9=id      shcool_id   time_created
    1935    2788320     2019-10-31 14:10:46

I don't know how many unique id's there are, so I was wondering if there was a way to get around that.

SORRY IF THIS HAS BEEN ASKED BEFORE. I DID SEARCH BUT MAYBE IM NOT SEARCHING FOR THE CORRECT PHRASE ¯_(ツ)_/¯

THANK YOU SO MUCH IN ADVANCE!

Upvotes: 0

Views: 27

Answers (2)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

If you want the dataframes to be globally available, you will have to assign to globals():

>>> for i, (_, v) in enumerate(df.groupby('id'), start=1):
...     globals()[f'df{i}'] = v

# Now all the new dfs will be available globally

>>> df1
    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

But it is probably better to create a dict:

>>> database = {f'df{i}': v for i, (_, v) in enumerate(df.groupby('id'), start=1)}

>>> database['df1']
    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

If you want to be able to access dfs by their index group:

>>> database = dict(list(df.groupby('id')))

>>> database[710]

    id  shcool_id         time_created
0  710    1045152  2019-07-26 15:10:26

Upvotes: 1

Ajay Rawat
Ajay Rawat

Reputation: 258

Here df is your orginal dataframe. df_list will a list containing all dataframes split accrording to id

df_list = []
uniq_ids = df.id.unique()
for id in uniq_ids:
  new_df = df[df.id == id]  
  df_list.append(new_df)

sample output

df_list[2]
     id    shcool_id    time_created
2   2278    3460257     2019-11-01 17:31:11

df_list[3]
     id     shcool_id   time_created
3   3877    2186089     2020-02-14 14:53:43
4   3877    1841367     2020-02-14 14:53:43

Upvotes: 0

Related Questions