Boreal Coder
Boreal Coder

Reputation: 107

Pandas: condense survey results

I have survey results that have the following format:

   Lab Shepherd Poodle Frisky Sleepy Hungry
0  NaN      NaN    Yes    Yes    NaN    NaN
1  Yes      NaN    NaN    NaN    NaN    Yes
2  NaN      Yes    NaN    NaN    Yes    NaN

I would like to transform my dataframe to the following:

      Breed Disposition
0    Poodle      Frisky
1       Lab      Hungry
2  Shepherd      Sleepy

What is the most effective way to accomplish this? Groupby? Pivot?

Is there any advantage to replacing every instance of 'Yes' with 'True' and use Boolean tests?

Thanks in advance. DataFrame code follows, for ease of answering.

d = [['', '','Yes','Yes','',''],['Yes', '','','','','Yes'],['','Yes','','','Yes','']]
df = pd.DataFrame(data=d,columns=['Lab','Shepherd','Poodle','Frisky','Sleepy','Hungry'])
df.replace(r'^\s*$', np.nan, regex=True, inplace = True)

Upvotes: 1

Views: 193

Answers (2)

BENY
BENY

Reputation: 323226

There is another one

pd.DataFrame([df.columns.values[x] for x in df.notnull().values])
Out[145]: 
          0       1
0    Poodle  Frisky
1       Lab  Hungry
2  Shepherd  Sleepy

Upvotes: 1

jezrael
jezrael

Reputation: 862481

You can reshape by stack and create new DataFrame by groupby with list and DataFrame constructor:

a = df.stack().reset_index().groupby('level_0')['level_1'].apply(list)
df = pd.DataFrame(a.values.tolist())

Another solution is convert to bool - empty spaces are Falses and for each row create list, last use DataFrame constructor too:

a = df.astype(bool).apply(lambda x: x.index[x].values.tolist(), axis=1)
df = pd.DataFrame(a.values.tolist())

df = df.astype(bool).apply(lambda x: pd.Series(x.index[x].values), axis=1)

print (df)
          0       1
0    Poodle  Frisky
1       Lab  Hungry
2  Shepherd  Sleepy

Upvotes: 2

Related Questions