Reputation: 481
I have a data frame containing different sub areas. All I want is to be able to group them together and count them in another data frame. Here is how my data frame called data looks like:
Area
0 TEX001
1 TEX002
2 TEX003
3 ARZ004
4 ARZ015
5 MSS034
6 NWY001
7 NWY003
I have tried a loop to retrieve the information I needed using a loop but I only get lots of boolean values.
Here is what I tried:
u_area=list(data['U_area'].unique())
listo=[]
for i in u_area:
d={}
d['Sector']= i
d['Sec_name]=list(data['Area'].str.startswith(i))
print (x)
d['Total']= len(d['sites'])
listo.append(d)
I am hoping to have something a list containing many dictionaries like the below so that I can data frame my list with pandas to have someting like the following:
Sector Sec_name Total
TEX TEX001,TEX002,TEX003, 3
ARZ ARZ004, ARZ035 2
MSS MSS034 1
what I am doing wrong to have those boolean values??
Upvotes: 3
Views: 66
Reputation: 1284
Try this:
import pandas as pd
# Test data.
data = pd.DataFrame({'area': ['TEX001', 'TEX002', 'TEX003', 'ARZ004', 'ARZ015', 'MSS034', 'NWY001', 'NWY003']})
# Slice out the sector, assumes this is always the first three characters of "area".
data['sector'] = data['area'].str[:3]
# Group and aggregate.
grouped = data.groupby('sector')['area'].agg(['count', lambda x: ', '.join(x)])
# Clean up the column names.
grouped.columns = ['count', 'sec_name']
print(grouped)
Gives:
count sec_name
sector
ARZ 2 ARZ004, ARZ015
MSS 1 MSS034
NWY 2 NWY001, NWY003
TEX 3 TEX001, TEX002, TEX003
Upvotes: 1
Reputation: 323316
IIUC with agg
and groupby
yourdf=df.groupby(df['Area'].str[:3])['Area'].agg([','.join,'count'])
yourdf.columns=['Sec_name','Total']
yourdf# if need reset_index adding yourdf.reset_index(inplace=True)
Out[17]:
Sec_name Total
Area
ARZ ARZ004,ARZ015 2
MSS MSS034 1
NWY NWY001,NWY003 2
TEX TEX001,TEX002,TEX003 3
Upvotes: 1