Mike H
Mike H

Reputation: 29

Pandas Concat to Multiindex on Columns

Been playing around with this for a while. I'm working with test data, of which I have a series of test points, a series of sensors, and for each sensor I have min/max/avg/stdev data.

I had played around with the idea of simply appending e.g. "_min" on the end of each tag and creating a dataframe of nColumns*3 width. But... that feels hacky -- and when I call to plot the variables, I'm going to have to process the string value to add that suffix on... feels clumsy.

It seems that a multiindex is the right way to do it, which would allow me to handle the sensor name, and the measurement individually.

I'm currently reading in the data like:

data = pd.read_excel(os.path.join(working_path, working_dir, staticDataFileName), 
       sheet_name='sheet', skiprows = 6, nrows=2000,  usecols = 'A,D:G', names = ["Tag", "Min", "Max", "Avg", "Stdev"], dtype={'Tag': str})

I'm then splitting the dataframe into each individual variable.

df_min = data[["Tag", "Min"]]
...

I currently have some code working where I only have a single average value.

temp = readRawData(wd, f, dataset)
# Drop the bad rows
temp.drop(temp.index[temp['Tag'] == '0'], inplace = True)
temp2 = temp.T
temp2.rename(columns=temp2.iloc[0], inplace = True)
temp2.drop(temp2.index[0], inplace = True)

I need to transpose the dataframe to get the tag names as columns, and then set the columns to the tag names. I then drop the first index, which now is just the tag names. In my code, I am looping over all files, and create the dataframe for all datapoints with

data = pd.concat([data, temp2])

Somewhere in there, I need to figure out how to create this multiindex dataframe. Most of the examples given in the pandas user guide LINK have the indices as multi-level, not columns. The example they give.. I'm having a hard time following.

I'm looking for guidance on how to take a series of dataframe which look like

df_min
   Tag1   Tag2  TagN
0  min1   min2  minN

df_avg
   Tag1   Tag2  TagN
0  avg1   avg2  avgN

and combine them into

df
    Tag1                 Tag2     ...      TagN
   Min   Max Avg    Min   Max  Avg      Min Max Avg 
0  min1 max1 avg1   min2 max2 avg2      minN maxN avgN

Of course, is this is a terrible idea, please let me know. Thanks!

Upvotes: 0

Views: 61

Answers (1)

Mike H
Mike H

Reputation: 29

I was able to make this work by using a solution here: https://stackoverflow.com/a/47338266/14066896

It's not pretty... but it seems to be working

        for f in staticDataFileName:
            
            temp_all = readRawData(wd, f)
            temp_all.drop(temp_all.index[temp_all['Tag'] == '0'], inplace = True)
            
            column_list = []
            steady_dict = dict()
            temp = temp_all.T

            temp.rename(columns=temp.iloc[0], inplace=True)
            temp.drop(temp.index[0], inplace=True)
            temp.reset_index(inplace=True)
            temp.drop(columns=['index'], inplace=True)

            #create column names 
            for column in temp.columns:
                column_list.append((column, "Min"))
                column_list.append((column, "Max"))
                column_list.append((column, "Avg"))
                column_list.append((column, "Stdev"))

            j = 0
            for columnName, columnData in temp.iteritems():
                temp_dict = dict()
                temp_dict["Min"] = temp.iloc[0, j]
                temp_dict["Max"] = temp.iloc[1, j]
                temp_dict["Avg"] = temp.iloc[2, j]
                temp_dict["Stdev"] = temp.iloc[3, j]
                j += 1
                steady_dict[columnName] = temp_dict

            
            t = pd.DataFrame(steady_dict).unstack().to_frame().T           
            t.columns = pd.MultiIndex.from_tuples(column_list)
            #correctStaticData(temp2, wd2)
            
            data = pd.concat([data, t])           

Upvotes: 1

Related Questions