Reputation: 107
I have survey results that have the following format:
Lab Shepherd Poodle Frisky Sleepy Hungry
0 NaN NaN Yes Yes NaN NaN
1 Yes NaN NaN NaN NaN Yes
2 NaN Yes NaN NaN Yes NaN
I would like to transform my dataframe to the following:
Breed Disposition
0 Poodle Frisky
1 Lab Hungry
2 Shepherd Sleepy
What is the most effective way to accomplish this? Groupby? Pivot?
Is there any advantage to replacing every instance of 'Yes' with 'True' and use Boolean tests?
Thanks in advance. DataFrame code follows, for ease of answering.
d = [['', '','Yes','Yes','',''],['Yes', '','','','','Yes'],['','Yes','','','Yes','']]
df = pd.DataFrame(data=d,columns=['Lab','Shepherd','Poodle','Frisky','Sleepy','Hungry'])
df.replace(r'^\s*$', np.nan, regex=True, inplace = True)
Upvotes: 1
Views: 193
Reputation: 323226
There is another one
pd.DataFrame([df.columns.values[x] for x in df.notnull().values])
Out[145]:
0 1
0 Poodle Frisky
1 Lab Hungry
2 Shepherd Sleepy
Upvotes: 1
Reputation: 862481
You can reshape by stack
and create new DataFrame
by groupby
with list
and DataFrame
constructor:
a = df.stack().reset_index().groupby('level_0')['level_1'].apply(list)
df = pd.DataFrame(a.values.tolist())
Another solution is convert to bool
- empty spaces are False
s and for each row create list, last use DataFrame
constructor too:
a = df.astype(bool).apply(lambda x: x.index[x].values.tolist(), axis=1)
df = pd.DataFrame(a.values.tolist())
df = df.astype(bool).apply(lambda x: pd.Series(x.index[x].values), axis=1)
print (df)
0 1
0 Poodle Frisky
1 Lab Hungry
2 Shepherd Sleepy
Upvotes: 2