Reputation: 97
I have a table of two columns with a many-to-many relationship.
e.g.
Animal Food
rabbit grass
rabbit carrots
rabbit cabbage
dog carrots
horse grass
horse hay
I want something like this:
Animal Food1 Food2 Food3
rabbit grass carrots cabbage
dog carrots
horse grass hay
or something like this:
Animal Grass Carrots Cabbage Hay
rabbit True True True False
dog False True False False
horse True False False True
I've tried pivot table, melt and stack, but still cannot figure out how to do this. Any help will be appreciated. Thanks!!
Upvotes: 4
Views: 157
Reputation: 294308
Option 1
groupby
with size
df.groupby(['Animal', 'Food']).size().unstack(fill_value=0).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
s = df.groupby('Animal').Food.apply(list)
pd.DataFrame(s.values.tolist(), s.index).add_prefix('Food').fillna('')
Food0 Food1 Food2
Animal
dog carrots
horse grass hay
rabbit grass carrots cabbage
Option 2
groupby
with value_counts
df.groupby('Animal').Food.value_counts().unstack(fill_value=0).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
Option 3
groupby
and str.get_dummies
df.groupby('Animal').Food.apply('|'.join).str.get_dummies().astype(bool)
cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
Option 4
pandas.factorize
with numpy.bincount
f1, u1 = pd.factorize(df.Animal.values)
f2, u2 = pd.factorize(df.Food.values)
n = u1.size
m = u2.size
b = np.bincount(f1 * m + f2, minlength=n * m).reshape(n, m)
pd.DataFrame(b.astype(bool), u1, u2)
grass carrots cabbage hay
rabbit True True True False
dog False True False False
horse True False False True
Option 5
was bored... so came up with more
f, u = pd.factorize(df.Animal.values)
n = u.size
a = [[] for _ in range(n)]
[a[i].append(food) for i, food in zip(f, df.Food)];
pd.DataFrame(a, u).rename(columns=lambda x: x+1).add_prefix('Food').fillna('')
Food1 Food2 Food3
rabbit grass carrots cabbage
dog carrots
horse grass hay
Upvotes: 4
Reputation: 153460
df_out = df.set_index(['Animal','Food']).assign(Value=True).unstack().fillna(False)
df_out.columns = df_out.columns.droplevel()
Output:
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
Upvotes: 4
Reputation: 38415
You can use crosstab
pd.crosstab(df.Animal, df.Food).astype(bool)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
Upvotes: 3
Reputation: 862731
You can use:
df = pd.pivot(index=df['Animal'],
columns=df.groupby('Animal').cumcount().add(1).astype(str),
values=df['Food'])
.add_prefix('Food').fillna('')
print (df)
Food1 Food2 Food3
Animal
dog carrots
horse grass hay
rabbit grass carrots cabbage
Or:
df = pd.crosstab(df['Animal'], df['Food']).astype(bool)
print (df)
Food cabbage carrots grass hay
Animal
dog False True False False
horse False False True True
rabbit True True True False
Upvotes: 4