Reputation:
The data in my csv likes this:
staff_id clock_time device_id latitude longitude
1001 2020/9/20 7:26 d_1 24.48237852 118.1558955
1001 2020/9/20 5:30 d_1 24.59689407 118.0863806
1001 2020/9/18 4:17 d_2 24.59222786 118.0955275
1001 2020/9/16 3:33 d_2 24.59208312 118.0957197
1001 2020/9/15 8:34 d_2 24.59732974 118.0859644
1001 2020/9/14 4:43 d_2 25.68714724 119.3918519
1002 2020/9/13 1:17 d_3 24.58618235 118.1065715
1002 2020/9/11 9:20 d_3 24.63024186 118.0667598
1002 2020/9/10 7:22 d_3 24.48287497 118.1542957
1002 2020/9/7 5:38 d_4 25.07601853 118.7335211
1003 2020/9/5 5:44 d_6 24.59803941 118.0863071
1003 2020/9/4 7:37 d_6 24.48285023 118.1545752
1003 2020/9/3 2:38 d_6 24.6381382 118.0677933
1003 2020/8/31 6:43 d_7 24.49278011 118.1395677
1003 2020/8/30 11:41 d_7 24.59205252 118.0955596
1003 2020/8/29 3:35 d_7 24.51817637 118.1764342
1003 2020/8/28 5:05 d_7 24.59603175 118.0846872
1003 2020/8/27 2:55 d_8 26.39899424 117.7866387
1003 2020/8/26 7:45 d_8 26.39900029 117.7866379
1003 2020/8/26 3:09 d_8 26.40672436 117.8008659
1003 2020/8/26 0:26 d_8 26.89169118 117.1612365
1003 2020/8/25 9:38 d_8 26.89764297 117.1760012
1003 2020/5/19 8:29 d_8 24.47420087 118.1085551
1003 2020/5/18 9:06 d_8 24.473124 118.1705641
1003 2020/5/16 7:54 d_8 24.5101858 117.8954614
I want to split the dataframe into sub-dataframes according to staff_id
and device_id
, and put these sub-dataframe into a list,for example:
sub-dataframe1 is:
1001 2020/9/20 7:26 d_1 24.48237852 118.1558955
1001 2020/9/20 5:30 d_1 24.59689407 118.0863806
sub-dataframe2 is:
1001 2020/9/18 4:17 d_2 24.59222786 118.0955275
1001 2020/9/16 3:33 d_2 24.59208312 118.0957197
1001 2020/9/15 8:34 d_2 24.59732974 118.0859644
1001 2020/9/14 4:43 d_2 25.68714724 119.3918519
sub-dataframe3 is:
1002 2020/9/13 1:17 d_3 24.58618235 118.1065715
1002 2020/9/11 9:20 d_3 24.63024186 118.0667598
1002 2020/9/10 7:22 d_3 24.48287497 118.1542957
So on.
How to do this?
My code:
import pandas as pd
df = pd.read_csv(r'for_test.csv', sep=',', encoding='utf-8')
gb = df.groupby(['staff_id','device_id'])
Upvotes: 0
Views: 114
Reputation: 832
If you want to extract separate DataFrames, you just need to iterate over your own solution:
import pandas as pd
df = pd.read_csv(r'for_test.csv', sep=',', encoding='utf-8')
gb = df.groupby(['staff_id','device_id'])
l = []
for i in gb.indices:
df = pd.DataFrame(gb.get_group(i))
l.append(df)
At the end, you will have separate data frames based on each staff_id/device_id pair in the list "l". Just have to add I think there are cleaner ways to iterate over the GroupBy object, but this will do for now.
Upvotes: 1
Reputation: 475
Try this.
import pandas as pd
df = pd.read_csv(r'for_test.csv', sep=',', encoding='utf-8')
gb = df.groupby('device_id')
#print(gb.first())
#print the groups for example 'd_1'
print(gb.get_group('d_1') )
#to convert to list
print(gb.get_group('d_1').values.tolist())
Upvotes: 0