Efficient way to update column value for subset of rows on Pandas DataFrame?

Question

When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?

Easy example:

import pandas as pd

df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
                   'value' : pd.Series([1., 2., 3., 4.])})

Objective: update the value column based on names length and the initial value of the value column itself.

The following line achieves the objective:

df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000

However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.

Basically I'm looking for the pandas equivalent to R data.table ':=' operator:

df[nchar(name) == 4, value := value*1000]

And for other kind of operations such:

df[nchar(name) == 4, value := paste0("short_", as.character(value))]

Environment: Python 3.6 Pandas 0.22

Thanks in advance.

jpp · Accepted Answer

This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

Efficient way to update column value for subset of rows on Pandas DataFrame?

Answers (2)

Related Questions