Pandas: condense survey results

Question

I have survey results that have the following format:

   Lab Shepherd Poodle Frisky Sleepy Hungry
0  NaN      NaN    Yes    Yes    NaN    NaN
1  Yes      NaN    NaN    NaN    NaN    Yes
2  NaN      Yes    NaN    NaN    Yes    NaN

I would like to transform my dataframe to the following:

      Breed Disposition
0    Poodle      Frisky
1       Lab      Hungry
2  Shepherd      Sleepy

What is the most effective way to accomplish this? Groupby? Pivot?

Is there any advantage to replacing every instance of 'Yes' with 'True' and use Boolean tests?

Thanks in advance. DataFrame code follows, for ease of answering.

d = [['', '','Yes','Yes','',''],['Yes', '','','','','Yes'],['','Yes','','','Yes','']]
df = pd.DataFrame(data=d,columns=['Lab','Shepherd','Poodle','Frisky','Sleepy','Hungry'])
df.replace(r'^\s*$', np.nan, regex=True, inplace = True)

jezrael · Accepted Answer

You can reshape by stack and create new DataFrame by groupby with list and DataFrame constructor:

a = df.stack().reset_index().groupby('level_0')['level_1'].apply(list)
df = pd.DataFrame(a.values.tolist())

Another solution is convert to bool - empty spaces are Falses and for each row create list, last use DataFrame constructor too:

a = df.astype(bool).apply(lambda x: x.index[x].values.tolist(), axis=1)
df = pd.DataFrame(a.values.tolist())

df = df.astype(bool).apply(lambda x: pd.Series(x.index[x].values), axis=1)

print (df)
          0       1
0    Poodle  Frisky
1       Lab  Hungry
2  Shepherd  Sleepy

Pandas: condense survey results

Answers (2)

Related Questions