Stefano Potter
Stefano Potter

Reputation: 3577

Plotting to seperate pdf files based on groupby

I have a dataframe like this:

    NDVI   Value   Allotment   Date
0      0  0.208430  Arnstson  19840517
1      0  0.211430  Arnstson  19840517
2      0  0.214430  Arnstson  19840517
3      2  0.217430  Arnstson  19840517
4      4  0.220430  Arnstson  19840517
5      1  0.223430  Arnstson  19840517
6      6  0.226430  Arnstson  19840517
7      1  0.229430  Arnstson  19840517
8     11  0.232430  Arnstson  19840517
9     13  0.235430  Arnstson  19840517
10    17  0.238430  Arnstson  19840517
11     9  0.241430  Arnstson  19840517
12     9  0.244430  Arnstson  19840517
13     7  0.247430  Arnstson  19840517
14    22  0.250430  Woodlot   19840517
15    17  0.253430  Woodlot   19840517
16    14  0.256430  Woodlot   19840517
17     5  0.259430  Woodlot   19840517
18    14  0.262430  Woodlot   19840517
19    19  0.265430  Woodlot   19840517
20    10  0.268430  Woodlot   19840517
21    11  0.271430  Arnstson  19840518
22    10  0.274430  Arnstson  19840518
23     9  0.277430  Arnstson  19840518
24     9  0.280430  Arnstson  19840518
25     5  0.283430  Woodlot   19840518
26     7  0.286430  Woodlot   19840518
27     1  0.289430  Woodlot   19840518
28    11  0.292430  Woodlot   19840518
29     6  0.295430  Woodlot   19840518

and I want to create plots based on the Allotment that are sent to different pdf files. So I want all plots which contain unique Allotment names sent to one file which plot NDVI vs. Value for each respective Date. I can do this easily for an individual Allotment with this code:

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

group=df.groupby(['Allotment'])
Arnstson=group.get_group('Arnstson')

with PdfPages(r'C:\delete.pdf') as pdf:
    for i, group in Arnstson.groupby(['Allotment', 'Date']):
        plot=group.plot(x='Value', y='NDVI', title=str(i)).get_figure()
        pdf.savefig(plot)  
        plt.close(plot)

but I have 53 unique names in Allotment and would prefer not to have to select all of them individually.

Upvotes: 0

Views: 175

Answers (1)

Kevin S
Kevin S

Reputation: 2803

One strategy is to open all of the PDF files, write to the appropriate ones, and then close them. Here I use a dictionary to track the file handles where Allotment is the key. I write them all then close all file handles in a separate step.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

pdf_files = {}
for group_name, group in df.groupby(['Allotment', 'Date']):
    allotment, date = group_name
    if allotment not in pdf_files:
        pdf_files[allotment] = PdfPages('C:\\' + allotment + '.pdf') 
    plot=group.plot(x='Value', y='NDVI', title=str(group_name)).get_figure()
    pdf_files[allotment].savefig(plot)
    plt.close(plot)

for key in pdf_files:
    pdf_files[key].close()

An alternative would be to use a nested group by, where the outer groupby used Allotment and the inner (used on the group from the outer) used Date. This would allow the file to be open and closed one at a time, and would be better if there were potentially lots of Allotments.

import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.backends.backend_pdf import PdfPages

pdf_files = {}
for allotment, outer_group  in df.groupby(['Allotment']):
    with PdfPages('C:\\' + allotment '.pdf') as pdf:
        for date, inner_group in outer_group.groupby(['Date']):
            plot=group.plot(x='Value', y='NDVI', title=str(allotment, date)).get_figure()
            pdf.savefig(plot)
            plt.close(plot)

This version is a little shorter, although it does involve nested loops. I prefer the second as it seem a bit clearer as well.

Upvotes: 1

Related Questions