Reputation: 351
I am new Python programmer, and I want from this,
dic = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
to, this DataFrame Object.
I tried code like this
df = pd.DataFrame(index=["a","b","c","d","e","f"], columns=[])
for i in result:
print("i",i)
print("v", v)
df2 = pd.DataFrame(i)
df.append(df2)
Please help me to how should I code this
Upvotes: 0
Views: 198
Reputation: 863176
First convert dict
to Series
and then use MultiLabelBinarizer + DataFrame
constructor, last cast to boolean:
d = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
s = pd.Series(d)
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df = pd.DataFrame(mlb.fit_transform(s),columns=mlb.classes_, index=s.index).astype(bool)
Another solution with str.join
for joining by |
what is default separator in str.get_dummies
:
df = s.str.join('|').str.get_dummies().astype(bool)
print (df)
a b c d e f
word1 True True True False False False
word2 False True False True True False
word3 True False True False False True
Upvotes: 2
Reputation: 164773
Here is one way using pd.get_dummies
:
import pandas as pd
d = {"word1": ["a","b","c"], "word2": ["b", "d", "e"], "word3": ["a", "f", "c"]}
df = pd.DataFrame.from_dict(d, orient='index')
df['values'] = df.values.tolist()
df = df.drop(df.columns[:], 1)\
.join(pd.get_dummies(df['values'].apply(pd.Series).stack()).sum(level=0))\
.astype(bool)
Result
a b c d e f
word1 True True True False False False
word2 False True False True True False
word3 True False True False False True
Explanation
pd.Series
of lists for each word.pd.get_dummies
to this series with some manipulation.int
to bool
for display purposes.Upvotes: 1