Reputation: 1687
I'm using pandas to calculate the sum of a column where "anystring" is in the filename and put add a Row "Totals_string" with the sum the requests for named "any string"
df = pd.read_sql("select count(*) as requests,\
filename,\
file_extension,\
date_trunc('day', log_time) as date\
from "+dbase+"\
where\
filename like '%anyStringA%'\
OR\
filename like '%anyStringB%'\
OR\
filename like '%anyStringC%'\
and\
file_extension not in ('gif')\
group by filename,file_extension,date_trunc('day', log_time)\
order by requests desc",conn)
This creates a df like this:
requests filename Extension date
0 10 xxanyStringAxx .jpg 2018-10-31
1 8 xxanyStringBxx .jpg 2018-10-31
2 11 xxanyStringCxx .jpg 2018-10-31
3 12 xxxanyStringAxx .jpg 2018-10-31
.
I would like to create a new DF with the totals of each condition for example the data frame would look like
Totals_anyStringA = 22
Totals_anyStringB = 8
Totals_anyStringC = 11
As a suggestion from @sacul I did this:
Totals_df = (df.groupby(df.url_info_filename.str.extract('anyName(.*)')\
.requests.sum()\
.add_prefix('Total_')\
.to_frame()))
My error was :
AttributeError: 'Series' object has no attribute 'requests'
*also I only don't know how to do a new row for each of the different names, as you can see I was just testing with anyStringA but it should have all the names of interest in there somehow.
Thanks
Upvotes: 0
Views: 41
Reputation: 51365
IIUC, you want to extract the string following Album
and preceding the underscore _
from filename, then groupby that, get the sum, and put it into a new table. If that's the case, you can do something like this (I've added some arbitrary albums to illustrate):
>>> df
Requests filename Extension date
0 914208 AlbumA_100 .jpg 2018-10-31
1 73795 AlbumA_132 .jpg 2018-10-31
2 39651 AlbumA_130 .jpg 2018-10-31
3 10 AlbumB_130 .jpg 2018-10-31
4 15 AlbumB_135 .jpg 2018-10-31
4 85 AlbumC_135 .jpg 2018-10-31
4 15 AlbumC_13 .jpg 2018-10-31
totals_df = (df.groupby(df.filename.str.extract('Album(.*)_'))
.Requests.sum()
.add_prefix('Total_')
.to_frame())
>>> totals_df
Requests
filename
Total_A 1027654
Total_B 25
Total_C 100
Upvotes: 1