Reputation: 3200

Updating a pandas DataFrame row with a dictionary

I've found a behavior in pandas DataFrames that I don't understand.

df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), index=['one', 'one', 'two'], columns=['col1', 'col2', 'col3'])
new_data = pd.Series({'col1': 'new', 'col2': 'new', 'col3': 'new'})
df.iloc[0] = new_data
# resulting df looks like:

#       col1    col2    col3
#one    new     new     new
#one    9       6       1
#two    8       3       7

But if I try to add a dictionary instead, I get this:

new_data = {'col1': 'new', 'col2': 'new', 'col3': 'new'}
df.iloc[0] = new_data
#
#         col1  col2    col3
#one      col2  col3    col1
#one      2     1       7
#two      5     8       6

Why is this happening? In the process of writing up this question, I realized that most likely df.loc is only taking the keys from new_data, which also explains why the values are out of order. But, again, why is this the case? If I try to create a DataFrame from a dictionary, it handles the keys as if they were columns:

pd.DataFrame([new_data])

#    col1   col2    col3
#0  new     new     new

Why is that not the default behavior in df.loc?

Upvotes: 9

Answers (4)

Joe Flack

Reputation: 974

For me on Python 3.9, pandas 1.5.3, this works: df.loc[INDEX, list(MY_DICT.keys())] = list(MY_DICT.values())

Upvotes: 1

Markus Dutschke

Reputation: 10626

a compact way

using an intermediate cast to pd.Series

>>> import pandas as pd
>>> df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), columns=['col1', 'col2', 'col3'])
>>> new_data = {'col1': 'new1', 'col2': 'new2', 'col3': 'new3'}
>>> 
>>> df
   col1  col2  col3
0     5     7     9
1     8     7     8
2     5     3     3
>>> new_data
{'col1': 'new1', 'col2': 'new2', 'col3': 'new3'}
>>> 
>>> df.loc[0] = pd.Series(new_data)
>>> df
   col1  col2  col3
0  new1  new2  new3
1     8     7     8
2     5     3     3

Upvotes: 0

Markus Dutschke

Reputation: 10626

just how to do it

this is a compact way, how to fulfill your task. I removed the index of your df, as "one" appeared twice and this prevents unique indexing.

>>> df = pd.DataFrame(np.random.randint(1, 10, (3, 3)), columns=['col1', 'col2', 'col3'])
>>> new_data = {'col1': 'new', 'col2': 'new', 'col3': 'new'}
>>> 
>>> df
   col1  col2  col3
0     1     6     1
1     4     2     3
2     6     2     3
>>> new_data
{'col1': 'new', 'col2': 'new', 'col3': 'new'}
>>> 
>>> df.loc[0, new_data.keys()] = new_data.values()
>>> df
  col1 col2 col3
0  new  new  new
1    4    2    3
2    6    2    3

Upvotes: 3

piRSquared

Reputation: 294516

It's the difference between how a dictionary iterates and how a pandas series is treated.

A pandas series matches it's index to columns when being assigned to a row and matches to index if being assigned to a column. After that, it assigns the value that corresponds to that matched index or column.

When an object is not a pandas object with a convenient index object to match off of, pandas will iterate through the object. A dictionary iterates through it's keys and that's why you see the dictionary keys in that rows slots. Dictionaries are not sorted and that's why you see shuffled keys in that row.

Upvotes: 8

Updating a pandas DataFrame row with a dictionary

Answers (4)

a compact way

just how to do it

Related Questions