Andrew Cummins
Andrew Cummins

Reputation: 41

Create a new column in Pandas referencing a dict and other columns

I have a dataframe containing columns with 0's and 1's (A, B C). I also have a dictionary containing key, value pairs corresponding to the columns. What I'm trying to do is create a new column such as the final dataframe below. I know I can probably do this with map, but having a hard time remembering the correct syntax. Any help would be greatly appreciated!

df = pd.DataFrame( {"A": [1, 0, 0], "B": [0, 0, 1],"C": [0, 1, 0]} )

    A   B   C
0   1   0   0
1   0   0   1
2   0   1   0

dict_ids = {'A':123, 'B':345, 'C':678}
{'A': 123, 'B': 345, 'C': 678}

df['ID'] = [123, 678, 345]

Final DF should look like this:

    A   B   C   ID
0   1   0   0   123
1   0   0   1   678
2   0   1   0   345

Upvotes: 1

Views: 52

Answers (2)

Henry Ecker
Henry Ecker

Reputation: 35636

We can try idxmax to convert the binary data to column values then map the values using the dictionary:

df['ID'] = df.idxmax(axis=1).map(dict_ids)

*Note if there are no 1 values in a row it will use the value from A (the first column), if there are multiple 1 values in a row it will use the first column with value of 1.


Assuming at most one 1 value per row we can take the dot between the DataFrame and a Series created from the dictionary:

df['ID'] = df.dot(pd.Series(dict_ids))

*Note if there are no 1 values in a row the value will be 0, if there are multiple 1 values the result will be the sum of the corresponding values for the 1 columns.

In this case, both options produce:

   A  B  C   ID
0  1  0  0  123
1  0  0  1  678
2  0  1  0  345

To handle more conditions we can mul by the column headers replace on the DataFrame then aggregate across rows into a collection:

# Modified Example:
df = pd.DataFrame({"A": [1, 0, 0], "B": [1, 0, 0], "C": [0, 1, 0]})

dict_ids = {'A': 123, 'B': 345, 'C': 678}

df['ID'] = df.mul(df.columns).replace(dict_ids).aggregate(
    lambda x: x[x.ne('')].tolist(), axis=1
)

df:

   A  B  C          ID
0  1  1  0  [123, 345]
1  0  0  1       [678]
2  0  0  0          []

Upvotes: 2

wwnde
wwnde

Reputation: 26676

Another way

df['D']=(df.astype('boolean').agg(lambda s: s.index[s].values, axis=1)# Get column values with one into rows, this comes as a list
         .str.join(',')#Strip of the column values the corner brackets
         .map(dict_ids)#Map the dict
        )

Upvotes: 0

Related Questions