Reputation: 4933
I have a data set that's sort of like this (first lines shown):
Sample Detector Cq
P_1 106 23.53152
P_1 106 23.152458
P_1 106 23.685083
P_1 135 24.465698
P_1 135 23.86892
P_1 135 23.723469
P_1 17 22.524242
P_1 17 20.658733
P_1 17 21.146122
Both "Sample" and "Detector" columns contain duplicated values ("Cq" is unique): to be precise, each "Detector" appears 3 times for each sample, because it's a replicate in the data.
What I need to do is to:
I thought that DataFrame.pivot
would do the trick, but it fails because of the duplicate data. What would be the best approach? Rename the duplicates, then reshape, or is there a better option?
EDIT: I thought over it and I think it's better to state the purpose. I need to store for each "Sample" the mean and standard deviation of their "Detector".
Upvotes: 4
Views: 1552
Reputation: 88
It looks like what you may be looking for is a hierarchical indexed dataframe [link].
Would something like this work?
#build a sample dataframe
a=['P_1']*9
b=[106,106,106,135,135,135,17,17,17]
c = np.random.randint(1,100,9)
df = pandas.DataFrame(data=zip(a,b,c), columns=['sample','detector','cq'])
#add a repetition number column
df['rep_num']=[1,2,3]*( len(df)/3 )
#Convert to a multi-indexed DF
df_multi = df.set_index(['sample','detector','rep_num'])
#--------------Resulting Dataframe---------------------
cq
sample detector rep_num
P_1 106 1 97
2 83
3 81
135 1 46
2 92
3 89
17 1 58
2 26
3 75
Upvotes: 5