Multiple columns for the same index

Question

I have lists of stats produced in different runs, for each of my different samples:

d = {
    "sample1": [
        {"stat1": 'a', "stat2": 98},  # stats for sample1, 1st run
        {"stat1": 'z', "stat2": 13},  # stats for sample1, 2nd run
    ],
    "sample2": [
        {"stat1": 'y', "stat2": 1089},  # stats for sample2, 1st run
        {"stat1": 'a', "stat2": 1015},  # stats for sample2, 2nd run
    ],
}

And I am trying to create a DataFrame out of this so stats can be easily manageable. For example, I would like to see the average of stat2 for a given sample. Or the most common stat1 value for all samples.

So df.loc["sample2"] but return all "rows" of stats. df.loc[["sample1", 3]] would just return the 4th run. df["stat1"] would of course return the entire column for all samples and runs, and df.loc["sample1"]["stat2"] the stat2 column for sample1. I hope I got the indexing right, I am not very familiar with pandas.

I can't manage to get it right. I have tried using pd.MultiIndex but that didn't really work:

index = pd.MultiIndex.from_tuples(???, names=['sample', 'run'])
df = pd.DataFrame(d, columns=['stat1', 'stat2'], index=index)

I have tried pairing each sample with the number of runs like [("sample1", 0), ("sample1", 1), ("sample2", 0), ("sample2", 1)] but that didn't really work out because the number of runs won't always be the same for each sample.

Also, all values were NaN so I must be doing something wrong when passing the data. Shouldn't passing d and the proper indices and columns be enough for the constructor to figure out how to populate the dataframe? How else should I do it then?

jezrael · Accepted Answer

I think you need concat with dict comprehension, if need change columns names of MultiIndex add rename_axis:

df = pd.concat({k:pd.DataFrame(v) for k, v in d.items()}).rename_axis(('sample','run'))
print (df)
            stat1  stat2
sample  run             
sample1 0       a     98
        1       z     13
sample2 0       y   1089
        1       a   1015

Multiple columns for the same index

Answers (1)

Related Questions