Creating multiple dataframes at once in a for loop

Question

I have a script that checks features within a GIS database for missing values in fields. If the value is missing it appends it to a dataframe with the top column being the name of the feature. Ideally I'd like to split this up so that the name of the feature becomes the sheet name but I'm not sure how to do this iteratively. It needs to be noted that not every feature will have a missing value, different features at different times may or may not have a missing value, that's the point of doing this check.

df = pd.DataFrame()

for dst, dstkey in zip(Dst, DstKey):
    with arcpy.da.SearchCursor(dst, ("OBJECTID", dstkey)) as cursor:
        #returns an iterator of tuples
        for row in cursor:
            if (row[1] is None or not str(row[1]).strip()):
                df = df.append(pd.DataFrame({dst.split("\").pop(): str(row[0])}, index=[0]), ignore_index=True)

This returns me a single dataframe. Ideally, I'd like to have multiple dataframes exported to Excel with dst as the sheet_name. The problem here being that I won't know how many (if any) features will have an empty value.

I have tried creating a blank dataframe for each feature but I can't figure out how utilizing this would work in the above block of code.

d = {dst.split("\").pop().split(".")[2]: pd.DataFrame() for dst in Dst}

It's worth noting that Dst is a list of paths to a SQL database and DstKey is a field within each database I'm checking for.

Parfait · Accepted Answer

Consider using a dictionary of data frames with dst as the key building an inner list of data frames that are concatenated outside the loop:

df_dict = {}

for dst, dstkey in zip(Dst, DstKey):
    inner = []
    with arcpy.da.SearchCursor(dst, ("OBJECTID", dstkey)) as cursor:
        # returns an iterator of tuples
        for row in cursor:
            if (row[1] is None or not str(row[1]).strip()):
                inner.append(pd.DataFrame({dst.split("\").pop(): str(row[0])}, index=[0])

    df_dict[dstkey] = pd.concat(inner, ignore_index=True)

Alternatively with list comprehension:

df_dict = {}

for dst, dstkey in zip(Dst, DstKey):
    with arcpy.da.SearchCursor(dst, ("OBJECTID", dstkey)) as cursor:
        # returns an iterator of tuples
        inner = [pd.DataFrame({dst.split("\").pop(): str(row[0])}, index=[0]) 
                 for row in cursor if (row[1] is None or not str(row[1]).strip())]

    df_dict[dstkey] = pd.concat(inner, ignore_index=True)

For Excel export using dictionary of data frames:

writer = pd.ExcelWriter('/path/to/output.xlsx')

for i, df in df_dict.items():
   df.to_excel(writer, sheet_name=i)

writer.save()

Creating multiple dataframes at once in a for loop

Answers (1)

Related Questions