Pandas multiindex from series of dataframes

Question

I have a series of dataframes with identical structure that represent results of a simulation for each hour of the year. Each simulation contains results for a series of coordinates (x,y).

Each dataframe is imported from a csv file that has time information only in the file name. Example:

results_YYMMDDHH.csv

contains data such

   x   y         a         b
 0.0 0.0  0.318705 -0.871259
 0.1 0.0 -0.937012  0.704270
 0.1 0.1 -0.032225 -1.939544
 0.0 0.1 -1.874781 -0.033073

I would like to create a single MultiIndexed Dataframe (level 0 is time and level 1 is (x,y)) that would allow me to perform various operations like averages, sums, max, etc. between these dataframes using the resampling or groupby methods. For each time step

The resulting dataframe should look something like this

                       x   y         a         b
2010-01-01 10:00     0.0 0.0  0.318705 -0.871259
                     0.1 0.0 -0.934512  0.745270
                     0.1 0.1 -0.0334525 -1.963544
                     0.0 0.1 -1.835781 -0.067573

2010-01-01 11:00     0.0 0.0  0.318705 -0.871259
                     0.1 0.0 -0.923012  0.745670
                     0.1 0.1 -0.035225 -1.963544
                     0.0 0.1 -1.835781 -0.067573
.................
.................
2010-12-01 10:00     0.0 0.0  0.318705 -0.871259
                     0.1 0.0 -0.923012  0.723270
                     0.1 0.1 -0.034225 -1.963234
                     0.0 0.1 -1.835781 -0.067233

You can imagine this for each hour of the year. I would like now to be able to calculate for example the average for the whole year or the average for June. Also any other function like the number of hours above a certain threshold or between a min and a max value. Please bear in mind that the result should be in any of these operations a DataFrame. For example the monthly averages should look like

              x   y     a     b
2010-01     0.0 0.0  0.45 -0.13
2010-02     0.1 0.0  0.55 -0.87
2010-03     0.1 0.1  0.24 -0.83
2010-04     0.0 0.1  0.11 -0.87

How do I build this MultiIndexed dataframe? I picture this like a timeseries of dataframes.

Brian from QuantRocket · Accepted Answer

Here is a different answer from my earlier one, in light of the more fully explained question. Iterate through the files and read them into pandas, parse the date and add it to the dataframe, then use set_index to create your multiindex. Once you've got all your dataframes, use pd.concat to combine them:

dataframes = []
for filename in filenames:
    df = pd.read_csv(filename)
    df["datetime"] = datetime.datetime.strptime(filename[8:18], "%Y%m%d%H")
    dataframes.append(df.set_index(["datetime","x", "y"]))

combined_df = pd.concat(dataframes)

Pandas multiindex from series of dataframes

Answers (2)

Related Questions