Groupby and Sample pandas

Question

I am trying to sample the resulting data after doing a groupby on multiple columns. If the respective groupby has more than 2 elements, I want to take sample 2 records, else take all the records

df:

col1   col2   col3   col4
A1     A2     A3     A4
A1     A2     A3     A5
A1     A2     A3     A6
B1     B2     B3     B4
B1     B2     B3     B5
C1     C2     C3     C4

target df:

col1   col2   col3   col4
A1     A2     A3     A4 or A5 or A6
A1     A2     A3     A4 or A5 or A6
B1     B2     B3     B4
B1     B2     B3     B5
C1     C2     C3     C4

I have mentioned A4 or A5 or A6 because, when we take sample, either of the three might return

This is what i have tried so far:

trial = pd.DataFrame(df.groupby(['col1', 'col2','col3'])['col4'].apply(lambda x: x if (len(x) <=2) else x.sample(2)))

However, in this I do not get col1, col2 and col3

jezrael · Accepted Answer

I think need double reset_index - first for remove 3.rd level of MultiIndex and second for convert MultiIndex to columns:

trial= (df.groupby(['col1', 'col2','col3'])['col4']
        .apply(lambda x: x if (len(x) <=2) else x.sample(2))
        .reset_index(level=3, drop=True)
        .reset_index())

Or reset_index with drop for remove column level_3:

trial= (df.groupby(['col1', 'col2','col3'])['col4']
        .apply(lambda x: x if (len(x) <=2) else x.sample(2))
        .reset_index()
        .drop('level_3', 1))

print (trial)
  col1 col2 col3 col4
0   A1   A2   A3   A4
1   A1   A2   A3   A6
2   B1   B2   B3   B4
3   B1   B2   B3   B5
4   C1   C2   C3   C4

Groupby and Sample pandas

Answers (2)

Related Questions