may
may

Reputation: 1195

Filling a dataframe from a dictionary keys and values

I have the following dataframe as an example.

df_test = pd.DataFrame(data=None, index=["green","yellow","red","pink"], columns=["bear","dog","cat"], dtype=None, copy=False)

enter image description here

I have the following dictionary with keys and values that are the same or related to the index and columns od my dataframe.

d = {"green":["bear","dog"], "yellow":["bear"], "red":["bear"]}

I would like to fill my dataframe according with the keys and values that are presented and if the key is not present I want to fill with empty.

Desired output

enter image description here

I only can think about making lists and looping for. Is there an easy way to implement this? or functions that can help me?

Upvotes: 4

Views: 13023

Answers (3)

Mabel Villalba
Mabel Villalba

Reputation: 2598

You can achieve what you want by doing:

# You can use elements that are not in the original dataframe
# and the row will be filled with empty

index_list = ["green", "yellow", "red", "pink", "purple"]

replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
        index=x.index), axis=1).reindex(index_list).replace(replace_dict) 

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty
purple  Empty  Empty  Empty

Explanation

You can accomplish what you want by checking whether the columns of the dataframe are present in the corresponding field of the dict:

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
    index=x.index), axis=1)

        bear    dog    cat
green   True   True  False
yellow  True  False  False
red     True  False  False

And then reindex according to the keys of the dict to fill find missing colors and fill them with empty:

index_list = ["green","yellow","red","pink", "purple"]

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
       index=x.index), axis=1).reindex(index_list)

        bear    dog    cat
green   True   True  False
yellow  True  False  False
red     True  False  False
pink     NaN    NaN    NaN
purple   NaN    NaN    NaN

Then, if you want to change the values you can replace them by using a dictionary like this:

replace_dict = {True: 'Yes', False: 'No', np.nan:'Empty'}

df_test.loc[list(d.keys())].apply(lambda x : pd.Series(x.index.isin(d[x.name]),
        index=x.index), axis=1).reindex(index_list).replace(replace_dict) 

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty
purple  Empty  Empty  Empty

Upvotes: 2

jezrael
jezrael

Reputation: 863226

Use loopd by dictionary and set True values, then replace all missing rows by by mask with Empty and last replace missing values by fillna:

for k, v in d.items():
    for x in v:
        df_test.loc[k, x] = 'Yes'

df_test = df_test.mask(df_test.isnull().all(axis=1), 'Empty').fillna('No')
print (df_test)
         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty

Upvotes: 4

jpp
jpp

Reputation: 164773

Here's a largely vectorised solution via pd.get_dummies and pd.DataFrame.reindex:

df = pd.DataFrame.from_dict(d, orient='index')

res = pd.get_dummies(df.reindex(df_test.index), prefix='', prefix_sep='')\
        .reindex(columns=df_test.columns)\
        .fillna(0).applymap({0: 'No', 1: 'Yes'}.get)\
        .reindex(index=np.hstack((df_test.index, df.index.difference(df_test.index))))\
        .fillna('Empty')

print(res)

         bear    dog    cat
green     Yes    Yes     No
yellow    Yes     No     No
red       Yes     No     No
pink    Empty  Empty  Empty

Upvotes: 2

Related Questions