Rajat
Rajat

Reputation: 487

python pandas change data frame cells using iloc

I am trying to modify two values in a single row of a data frame. However, I get an exception, which I am unable to explain the reason for.

In [1]: import numpy as np

In [2]: import pandas as pd

In [3]: df = pd.DataFrame(np.random.rand(2,3), index=['one', 'two'],
                          columns=list('ABC'))

In [4]: df['Z'] = list(range(len(df.index)))

In [5]: df.head(1)
Out[5]: 
            A         B         C  Z
one  0.977917  0.734311  0.069476  0

In [6]: df.iloc[0] = dict(B=3.5, Z=10)

/home/rajatgirotra/tools/miniconda2/envs/shriram/lib/python2.7/site-packages/pandas/core/indexing.pyc in _setitem_with_indexer(self, indexer, value) 525 526 if len(labels) != len(value): --> 527 raise ValueError('Must have equal len keys and value ' 528 'when setting with an iterable') 529

ValueError: Must have equal len keys and value when setting with an iterable

Is this way incorrect? How can I easily modify one or more cell values in the same row?

Upvotes: 3

Views: 4560

Answers (2)

piRSquared
piRSquared

Reputation: 294516

@jezrael's df.iloc[0] = pd.Series(d) is my preference.

But you can also use pd.DataFrame.update and wrap your dictionary in a pd.DataFrame

df.update(pd.DataFrame(dict(B=3.5, Z=10), ['one']))

df

            A         B         C     Z
one  0.339970  3.500000  0.528206  10.0
two  0.553827  0.117207  0.784605   1.0

While I'm at it, here is a creative way using pd.DataFrame.set_value and a list comprehension. This has the advantage of no overhead building the dataframe and notice the dtype is preserved on column 'Z'

[df.set_value('one', k, v) for k, v in dict(B=3.5, Z=10).items()];

df

            A         B         C   Z
one  0.099669  3.500000  0.248170  10
two  0.604340  0.305114  0.897305   1

Not that it matters all that much, but this is the timing over the tiny data sample

%timeit [df.set_value('one', k, v) for k, v in dict(B=3.5, Z=10).items()];
%timeit df.update(pd.DataFrame(dict(B=3.5, Z=10), ['one']))
%timeit df.iloc[0] = pd.Series(dict(B=3.5, Z=10))

100000 loops, best of 3: 5.29 µs per loop
1000 loops, best of 3: 1.51 ms per loop
1000 loops, best of 3: 402 µs per loop

Upvotes: 3

jezrael
jezrael

Reputation: 863531

I think you need select only columns by keys of dict by loc or iloc, else get NaNs:

d = dict(B=3.5, Z=10)
df.loc[df.index[0], d.keys()] = pd.Series(d)
print (df)
            A         B         C     Z
one  0.062352  3.500000  0.225811  10.0
two  0.655920  0.386443  0.063906   1.0
df.iloc[0, df.columns.get_indexer(d.keys())] = pd.Series(d)
print (df)
            A         B         C     Z
one  0.422479  3.500000  0.951087  10.0
two  0.097426  0.702746  0.257591   1.0

df.loc[df.index[0]] = pd.Series(d)
print (df)
            A         B         C     Z
one       NaN  3.500000       NaN  10.0
two  0.050399  0.917007  0.951725   1.0
df.iloc[0] = pd.Series(d)
print (df)
          A         B         C     Z
one     NaN  3.500000       NaN  10.0
two  0.5356  0.844221  0.023227   1.0

Upvotes: 3

Related Questions