Reputation: 1195
I have the following dataframe as an example.
df_test = pd.DataFrame(data=None, index=["green","yellow","red","pink"], columns=["bear","dog","cat"], dtype=None, copy=False)
I have the following dictionary with keys and values that are the same or related to the index and columns od my dataframe.
d = {"green":["bear","dog"], "yellow":["bear"], "red":["bear"]}
I would like to fill my dataframe according with the keys and values that are presented and if the key is not present I want to fill with empty.
Desired output
I only can think about making lists and looping for. Is there an easy way to implement this? or functions that can help me?
Upvotes: 4
Views: 13023
Reputation: 2598
You can achieve what you want by doing:
# You can use elements that are not in the original dataframe
# and the row will be filled with empty
index_list = ["green", "yellow", "red", "pink", "purple"]
replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}
df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
index=x.index), axis=1).reindex(index_list).replace(replace_dict)
bear dog cat
green Yes Yes No
yellow Yes No No
red Yes No No
pink Empty Empty Empty
purple Empty Empty Empty
Explanation
You can accomplish what you want by checking whether the columns of the dataframe are present in the corresponding field of the dict:
df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
index=x.index), axis=1)
bear dog cat
green True True False
yellow True False False
red True False False
And then reindex according to the keys of the dict to fill find missing colors and fill them with empty:
index_list = ["green","yellow","red","pink", "purple"]
df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
index=x.index), axis=1).reindex(index_list)
bear dog cat
green True True False
yellow True False False
red True False False
pink NaN NaN NaN
purple NaN NaN NaN
Then, if you want to change the values you can replace them by using a dictionary like this:
replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}
df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
index=x.index), axis=1).reindex(index_list).replace(replace_dict)
bear dog cat
green Yes Yes No
yellow Yes No No
red Yes No No
pink Empty Empty Empty
purple Empty Empty Empty
Upvotes: 2
Reputation: 863226
Use loopd by dictionary and set True
values, then replace all missing rows by by mask
with Empty
and last replace missing values by fillna
:
for k, v in d.items():
for x in v:
df_test.loc[k, x] = 'Yes'
df_test = df_test.mask(df_test.isnull().all(axis=1), 'Empty').fillna('No')
print (df_test)
bear dog cat
green Yes Yes No
yellow Yes No No
red Yes No No
pink Empty Empty Empty
Upvotes: 4
Reputation: 164773
Here's a largely vectorised solution via pd.get_dummies
and pd.DataFrame.reindex
:
df = pd.DataFrame.from_dict(d, orient='index')
res = pd.get_dummies(df.reindex(df_test.index), prefix='', prefix_sep='')\
.reindex(columns=df_test.columns)\
.fillna(0).applymap({0: 'No', 1: 'Yes'}.get)\
.reindex(index=np.hstack((df_test.index, df.index.difference(df_test.index))))\
.fillna('Empty')
print(res)
bear dog cat
green Yes Yes No
yellow Yes No No
red Yes No No
pink Empty Empty Empty
Upvotes: 2