Reputation: 29
Been playing around with this for a while. I'm working with test data, of which I have a series of test points, a series of sensors, and for each sensor I have min/max/avg/stdev data.
I had played around with the idea of simply appending e.g. "_min" on the end of each tag and creating a dataframe of nColumns*3 width. But... that feels hacky -- and when I call to plot the variables, I'm going to have to process the string value to add that suffix on... feels clumsy.
It seems that a multiindex is the right way to do it, which would allow me to handle the sensor name, and the measurement individually.
I'm currently reading in the data like:
data = pd.read_excel(os.path.join(working_path, working_dir, staticDataFileName),
sheet_name='sheet', skiprows = 6, nrows=2000, usecols = 'A,D:G', names = ["Tag", "Min", "Max", "Avg", "Stdev"], dtype={'Tag': str})
I'm then splitting the dataframe into each individual variable.
df_min = data[["Tag", "Min"]]
...
I currently have some code working where I only have a single average value.
temp = readRawData(wd, f, dataset)
# Drop the bad rows
temp.drop(temp.index[temp['Tag'] == '0'], inplace = True)
temp2 = temp.T
temp2.rename(columns=temp2.iloc[0], inplace = True)
temp2.drop(temp2.index[0], inplace = True)
I need to transpose the dataframe to get the tag names as columns, and then set the columns to the tag names. I then drop the first index, which now is just the tag names. In my code, I am looping over all files, and create the dataframe for all datapoints with
data = pd.concat([data, temp2])
Somewhere in there, I need to figure out how to create this multiindex dataframe. Most of the examples given in the pandas user guide LINK have the indices as multi-level, not columns. The example they give.. I'm having a hard time following.
I'm looking for guidance on how to take a series of dataframe which look like
df_min
Tag1 Tag2 TagN
0 min1 min2 minN
df_avg
Tag1 Tag2 TagN
0 avg1 avg2 avgN
and combine them into
df
Tag1 Tag2 ... TagN
Min Max Avg Min Max Avg Min Max Avg
0 min1 max1 avg1 min2 max2 avg2 minN maxN avgN
Of course, is this is a terrible idea, please let me know. Thanks!
Upvotes: 0
Views: 61
Reputation: 29
I was able to make this work by using a solution here: https://stackoverflow.com/a/47338266/14066896
It's not pretty... but it seems to be working
for f in staticDataFileName:
temp_all = readRawData(wd, f)
temp_all.drop(temp_all.index[temp_all['Tag'] == '0'], inplace = True)
column_list = []
steady_dict = dict()
temp = temp_all.T
temp.rename(columns=temp.iloc[0], inplace=True)
temp.drop(temp.index[0], inplace=True)
temp.reset_index(inplace=True)
temp.drop(columns=['index'], inplace=True)
#create column names
for column in temp.columns:
column_list.append((column, "Min"))
column_list.append((column, "Max"))
column_list.append((column, "Avg"))
column_list.append((column, "Stdev"))
j = 0
for columnName, columnData in temp.iteritems():
temp_dict = dict()
temp_dict["Min"] = temp.iloc[0, j]
temp_dict["Max"] = temp.iloc[1, j]
temp_dict["Avg"] = temp.iloc[2, j]
temp_dict["Stdev"] = temp.iloc[3, j]
j += 1
steady_dict[columnName] = temp_dict
t = pd.DataFrame(steady_dict).unstack().to_frame().T
t.columns = pd.MultiIndex.from_tuples(column_list)
#correctStaticData(temp2, wd2)
data = pd.concat([data, t])
Upvotes: 1