GingerBadger
GingerBadger

Reputation: 392

Populate Pandas DataFrame using a dictionary based on a condition

I have a DataFrame

>> test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4], 'C': [np.nan, np.nan, np.nan, np.nan], 'D': [np.nan, np.nan, np.nan, np.nan]})
    A   B   C   D
0   a   1       
1   b   2       
2   b   3       
3   b   4       

I also have a dictionary, where b in input_b signifies that I'm only modifying rows where row.A = b.

>> input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}

How do I populate the DataFrame with values from the dictionary to get

    A   B   C       D
0   a   1       
1   b   2   Moon    Elephant
2   b   3       
3   b   4   Sun     Mouse

Upvotes: 1

Views: 3968

Answers (4)

raninjan
raninjan

Reputation: 186

Using apply

test['C'] = test['B'].map(input_b).apply(lambda x: x[0] if type(x)==list else x)
test['D'] = test['B'].map(input_b).apply(lambda x: x[1] if type(x)==list else x)

yields

   A  B     C         D
0  a  1   NaN       NaN
1  b  2  Moon  Elephant
2  b  3   NaN       NaN
3  b  4   Sun     Mouse

Upvotes: 1

jpp
jpp

Reputation: 164623

You can use loc indexing after setting your index to B:

test = test.set_index('B')
test.loc[input_b, ['C', 'D']] = list(input_b.values())
test = test.reset_index()

print(test)

   B  A     C         D
0  1  a   NaN       NaN
1  2  b  Moon  Elephant
2  3  b   NaN       NaN
3  4  b   Sun     Mouse

Upvotes: 1

BENY
BENY

Reputation: 323226

Using update

test=test.set_index('B')
test.update(pd.DataFrame(input_b,index=['C','D']).T)
test=test.reset_index()
test
   B  A     C         D
0  1  a   NaN       NaN
1  2  b  Moon  Elephant
2  3  b   NaN       NaN
3  4  b   Sun     Mouse

Upvotes: 1

Pasa
Pasa

Reputation: 722

This may not be the most efficient solution, but from what I understand it got the job done:

import pandas as pd
import numpy as np

test = pd.DataFrame({'A': ['a', 'b', 'b', 'b'], 'B': [1, 2, 3, 4],
                     'C': [np.nan, np.nan, np.nan, np.nan], 
                     'D': [np.nan, np.nan, np.nan, np.nan]})


input_b = {2: ['Moon', 'Elephant'], 4: ['Sun', 'Mouse']}


for key, value in input_b.items():
    test.loc[test['B'] == key, ['C', 'D']] = value

print(test)

Yields:

   A  B     C         D
0  a  1   NaN       NaN
1  b  2  Moon  Elephant
2  b  3   NaN       NaN
3  b  4   Sun     Mouse

This will get slower if the dictionary input_b gets too large (too many rows are being updated, too many iterations in the for loop), but should be relatively fast with small input_b's even with large test dataframes.

This answer also assumes the keys in the input_b dictionary refer to the values of the B column in the original dataframe, and will add repeated values in the C and D columns for repeated values in the B column.

Upvotes: 3

Related Questions