user664859
user664859

Reputation: 153

Python: Create a model for reports (using pandas)

This is more of a model design question with python.

I need to parse and extract data from several log files into a pandas DataFrames. From these dataframes I need to create reports (as csv, excel and so on).

One way of design such is to create a file with 2 functions: 1. function to extract data from log file (regex is fine) 2. function of pandas query, something like this:

def get_top1000(group):

  return group.sort_index(by='births', ascending=False)[:1000]
  grouped = names.groupby(['year', 'sex'])
  top1000 = grouped.apply(get_top1000)

Then, my class could get all these queries and produce the reports for this. How this can be implemented with python correctly?

Upvotes: 2

Views: 1593

Answers (1)

jezrael
jezrael

Reputation: 863166

You need SeriesGroupBy.nlargest:

df = names.groupby(['year', 'sex'])['births'].nlargest(1000)

Sample:

names = pd.DataFrame({'year':[2000,2000,2000,2000,2000],
                   'sex':['M','M','F','F','F'],
                   'births':[7,8,9,1,2]})

print (names)
   births sex  year
0       7   M  2000
1       8   M  2000
2       9   F  2000
3       1   F  2000
4       2   F  2000

df = names.groupby(['year', 'sex'])['births']
          .nlargest(1)
          .reset_index(level=2, drop=True)
          .reset_index()
print (df)
   year sex  births
0  2000   F       9
1  2000   M       8

If in your data there are other columns, first set_index with these columns:

names = pd.DataFrame({'year':[2000,2000,2000,2000,2000],
                   'sex':['M','M','F','F','F'],
                   'births':[7,8,9,1,2],
                    'val':[3,2,4,5,6]})

print (names)
   births sex  val  year
0       7   M    3  2000
1       8   M    2  2000
2       9   F    4  2000
3       1   F    5  2000
4       2   F    6  2000

df = names.set_index('val') \
          .groupby(['year', 'sex'])['births'] \
          .nlargest(1) \
          .reset_index()
print (df)
   year sex  val  births
0  2000   F    4       9
1  2000   M    2       8

Upvotes: 2

Related Questions