jch
jch

Reputation: 187

Pandas Dataframe NameError: I can print the dataframe but I get name '' is not defined error when I attempt to aggregate columns

Does anyone have suggestions why I can't run a sum on the last dataframe?

Also open to suggestions if there's a shorter way to accomplish splitting out the tags and summing the frequencies.

import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)pandas_profiling

print("\nCalculate aggregates for tags :\n")
TagsDFGroupBy =  df.groupby(['Tags','Lab Location' ]).agg({'ADO ID': ['count']}).rename(columns={'ADO ID':'WorkItemCnt'}).reset_index()
print(TagsDFGroupBy)

Produces the output

                                                          Tags |  Labs | WorkItemCNT  |                                                --------------------------------------------------------------| ----- | ---        |
0|                                                        A2040|RXY Lab|1             |
1|                                      AWAITING COMMODITY QUAL|RXY Lab|1             |
2|                                                          DNR|RXY Lab|18            |  
3|                                         DNR; MISSING SKU DOC|RXY Lab|17            |
4|  MISSING QUAL INTAKE REQUEST; MISSING SKU DOC; NEED HARDWARE|QXR Lab|1             | 
5|                                              MISSING SKU DOC|RXY Lab|2             | 
6|                               MISSING SKU DOC; NEED HARDWARE|RXY Lab|1             |
7|                                     MISSING SKU DOC; NEED RA|RXY Lab|1             |
8|                                                NEED HARDWARE|RXY Lab|7             |
9|                                                NEED HARDWARE|VYZ Lab|4             |

And then I run code to split out the tags and sum the frequencies


print("\nSplit tags by semicolumn delimiter" )  
TagsDFGroupBy[['Tag1','Tag2','Tag3']] = TagsDFGroupBy.Tags.str.split(";",expand=True)

print("\nReplace none with blanks")  
mask = TagsDFGroupBy.applymap(lambda x: x is None)
cols = TagsDFGroupBy.columns[(mask).any()]
for col in TagsDFGroupBy[cols]:
    TagsDFGroupBy.loc[mask[col], col] = ''


print("\n3 different dataframes")  
TagsDFGroupBy1 = TagsDFGroupBy[['Lab Location','Tag1','WorkItemCnt']].rename(columns={'Tag1':'TagSplit'})
TagsDFGroupBy2 = TagsDFGroupBy[['Lab Location','Tag2','WorkItemCnt']].rename(columns={'Tag2':'TagSplit'})
TagsDFGroupBy3 = TagsDFGroupBy[['Lab Location','Tag3','WorkItemCnt']].rename(columns={'Tag3':'TagSplit'})


print("\nCombine 3 different dataframes into 1")  
TagsConcat = pd.concat([TagsDFGroupBy1, TagsDFGroupBy2, TagsDFGroupBy3], ignore_index=True)

# Get names of indexes for which TagSplit has a blank value
indexNames = TagsConcat[TagsConcat['TagSplit'] == '' ].index
# Delete these row indexes from dataFrame
TagsConcat.drop(indexNames , inplace=True)
TagsConcat.reset_index()
print('TagsConcat')
print(TagsConcat)

Which produces this output

       Lab Location                TagSplit         WorkItemCnt
                                                        count
 --------------|-------------------------------------- | ----------|
0       RXY LAB|                       A2040           |1
1       RXY LAB|     AWAITING COMMODITY QUAL           |1
2       RXY LAB|                         DNR           |18
3       RXY LAB|                         DNR           |17
4       QXR LAB|  MISSING QUAL INTAKE REQUEST          |1
5       RXY LAB|             MISSING SKU DOC           |2
6       RXY LAB|             MISSING SKU DOC           |1
7       RXY LAB|             MISSING SKU DOC           |1
8       RXY LAB|               NEED HARDWARE           |7
9       VYZ LAB|                NEED HARDWARE          |4
13      RXY LAB|             MISSING SKU DOC           |17
14      QXR LAB|              MISSING SKU DOC          |1
16      RXY LAB|               NEED HARDWARE           |1
17      RXY LAB|                     NEED RA           |1
24      QXR LAB|                NEED HARDWARE          |1

Finally, I try to use either

TagsFinal.groupby(['Lab Location', 'TagSplit'])['WorkItemCnt'].sum()

or

TagsFinal     =  TagsConcat.groupby(['Lab Location', 'TagSplit']).agg({'WorkItemCnt': ['sum']})

and I get this error:

KeyError: 'WorkItemCnt'

Upvotes: 1

Views: 409

Answers (1)

jezrael
jezrael

Reputation: 862771

I think your code is possible simplify - first are splitted column Tags with DataFrame.explode, and then is aggregate counts by GroupBy.size:

TagsFinal =  (df.assign(TagSplit = df['Tags'].str.split('; '))
                .explode('TagSplit')
                .groupby(['Labs', 'TagSplit'])
                .size()
                .reset_index(name='WorkItemCnt'))
                    

print (TagsFinal)
      Labs                     TagSplit  WorkItemCnt
0  QXR Lab  MISSING QUAL INTAKE REQUEST            1
1  QXR Lab              MISSING SKU DOC            1
2  QXR Lab                NEED HARDWARE            1
3  RXY Lab                        A2040            1
4  RXY Lab      AWAITING COMMODITY QUAL            1
5  RXY Lab                          DNR            2
6  RXY Lab              MISSING SKU DOC            4
7  RXY Lab                NEED HARDWARE            2
8  RXY Lab                      NEED RA            1
9  VYZ Lab                NEED HARDWARE            1

Upvotes: 1

Related Questions