chowpay
chowpay

Reputation: 1687

Appending dataframe total to new dataframe

I'm using pandas to calculate the sum of a column where "anystring" is in the filename and put add a Row "Totals_string" with the sum the requests for named "any string"

df = pd.read_sql("select count(*) as requests,\
filename,\
file_extension,\
date_trunc('day', log_time) as date\
from "+dbase+"\
where\
filename like '%anyStringA%'\
OR\
filename like '%anyStringB%'\
OR\
filename like '%anyStringC%'\
and\
file_extension not in ('gif')\
group by filename,file_extension,date_trunc('day', log_time)\
order by requests desc",conn)

This creates a df like this:

requests    filename    Extension   date
0   10  xxanyStringAxx  .jpg    2018-10-31
1   8   xxanyStringBxx  .jpg    2018-10-31
2   11  xxanyStringCxx  .jpg    2018-10-31
3   12  xxxanyStringAxx .jpg    2018-10-31
.

I would like to create a new DF with the totals of each condition for example the data frame would look like

Totals_anyStringA = 22
Totals_anyStringB = 8
Totals_anyStringC = 11

As a suggestion from @sacul I did this:

Totals_df = (df.groupby(df.url_info_filename.str.extract('anyName(.*)')\
                        .requests.sum()\
                        .add_prefix('Total_')\
                        .to_frame()))

My error was :

AttributeError: 'Series' object has no attribute 'requests'

*also I only don't know how to do a new row for each of the different names, as you can see I was just testing with anyStringA but it should have all the names of interest in there somehow.

Thanks

Upvotes: 0

Views: 41

Answers (1)

sacuL
sacuL

Reputation: 51365

IIUC, you want to extract the string following Album and preceding the underscore _ from filename, then groupby that, get the sum, and put it into a new table. If that's the case, you can do something like this (I've added some arbitrary albums to illustrate):

>>> df
   Requests    filename Extension        date
0    914208  AlbumA_100      .jpg  2018-10-31
1     73795  AlbumA_132      .jpg  2018-10-31
2     39651  AlbumA_130      .jpg  2018-10-31
3        10  AlbumB_130      .jpg  2018-10-31
4        15  AlbumB_135      .jpg  2018-10-31
4        85  AlbumC_135      .jpg  2018-10-31
4        15   AlbumC_13      .jpg  2018-10-31

totals_df = (df.groupby(df.filename.str.extract('Album(.*)_'))
             .Requests.sum()
             .add_prefix('Total_')
             .to_frame())

>>> totals_df
          Requests
filename          
Total_A    1027654
Total_B         25
Total_C        100

Upvotes: 1

Related Questions