Engineering data structure/ information retrieval and storage

Question

I have a data frame containing different sub areas. All I want is to be able to group them together and count them in another data frame. Here is how my data frame called data looks like:

    Area
0 TEX001
1 TEX002
2 TEX003
3 ARZ004
4 ARZ015
5 MSS034
6 NWY001
7 NWY003

I have tried a loop to retrieve the information I needed using a loop but I only get lots of boolean values.

Here is what I tried:

u_area=list(data['U_area'].unique())
listo=[]

for i in u_area:
      d={}
      d['Sector']= i
      d['Sec_name]=list(data['Area'].str.startswith(i))
      print (x)
      d['Total']= len(d['sites'])
      listo.append(d)

I am hoping to have something a list containing many dictionaries like the below so that I can data frame my list with pandas to have someting like the following:

    Sector   Sec_name              Total
     TEX     TEX001,TEX002,TEX003,  3
     ARZ     ARZ004, ARZ035         2
     MSS     MSS034                 1

what I am doing wrong to have those boolean values??

BENY · Accepted Answer

IIUC with agg and groupby

yourdf=df.groupby(df['Area'].str[:3])['Area'].agg([','.join,'count'])
yourdf.columns=['Sec_name','Total']
yourdf# if need reset_index adding yourdf.reset_index(inplace=True)
Out[17]: 
                  Sec_name  Total
Area                             
ARZ          ARZ004,ARZ015      2
MSS                 MSS034      1
NWY          NWY001,NWY003      2
TEX   TEX001,TEX002,TEX003      3

Engineering data structure/ information retrieval and storage

Answers (2)

Related Questions