Wiliam
Wiliam

Reputation: 1088

Mapping dictionary as new column efficiently

Suppose I have the following df,

d = {'col1':['cat','apple','banana','dog','pen']}
df= pd.DataFrame(d)

that gives

     col1
0     cat
1   apple
2  banana
3     dog
4     pen

I want to make a dictionary and map it as a new column to my df, such that I get the following output:

     col1   col2
0     cat    pet
1   apple  fruit
2  banana  fruit
3     dog    pet
4     pen  thing

I have made the following dictionary:

dictionary = {
  "pet": ['cat','dog'],
  "fruit": ['apple','banana'],
  "thing": 'pen'}

but not sure how to implement it as above, a tedious way of doing this is making one by one dictionary and then use map as:

di = {"cat": "pet", "dog": "pet", "apple": "fruit", "banana": "fruit", "pen":"thing"}

and

df['col2'] = df['col1'].map(di) 

but this is not the most efficient way I suppose. I wonder how one does this task more efficiently?

Upvotes: 2

Views: 53

Answers (2)

Dani Mesejo
Dani Mesejo

Reputation: 61910

Use a dictionary comprehension to explode the lists:

# transform all values to list
dictionary = {k: v if isinstance(v, list) else [v] for k, v in dictionary.items()}

# then explode the dictionary
df['col2'] = df['col1'].map({v: k for k, vs in dictionary.items() for v in vs})
print(df)

Output

     col1   col2
0     cat    pet
1   apple  fruit
2  banana  fruit
3     dog    pet
4     pen  thing

An alternative using only pandas (although more cumbersome):

# convert to Series
res = pd.DataFrame(data=list(dictionary.values()),
                   index=dictionary.keys()).stack().droplevel(-1).to_frame('vs').reset_index().set_index('vs').squeeze()

# use map with Series as parameter
df['col2'] = df['col1'].map(res)
print(df)

Output

     col1   col2
0     cat    pet
1   apple  fruit
2  banana  fruit
3     dog    pet
4     pen  thing

Upvotes: 2

Eric Truett
Eric Truett

Reputation: 3010

I would make a list of tuples and then create the dataframe from that list. It would be simpler if all of your values in the dict are lists instead of having strings for single values.

data = []
for k, v in dictionary.items():
    if isinstance(v, str):
        data.append((v, k))
    else:
        for vv in v:
            data.append((vv, k))

df = pd.DataFrame(data)

Upvotes: 2

Related Questions