AlexSB
AlexSB

Reputation: 607

Efficient way to update column value for subset of rows on Pandas DataFrame?

When using Pandas to update the value of a column for specif subset of rows, what is the best way to do it?

Easy example:

import pandas as pd

df = pd.DataFrame({'name' : pd.Series(['Alex', 'John', 'Christopher', 'Dwayne']),
                   'value' : pd.Series([1., 2., 3., 4.])})

Objective: update the value column based on names length and the initial value of the value column itself.

The following line achieves the objective:

df.value[df.name.str.len() == 4 ] = df.value[df.name.str.len() == 4] * 1000

However, this line filters the whole data frame two times, both in LHS and RHS. I assume is not the most efficient way. And it does not do it 'in place'.

Basically I'm looking for the pandas equivalent to R data.table ':=' operator:

df[nchar(name) == 4, value := value*1000]

And for other kind of operations such:

df[nchar(name) == 4, value := paste0("short_", as.character(value))]

Environment: Python 3.6 Pandas 0.22

Thanks in advance.

Upvotes: 10

Views: 9259

Answers (2)

jpp
jpp

Reputation: 164703

This may be what you require:

 df.loc[df.name.str.len() == 4, 'value'] *= 1000

 df.loc[df.name.str.len() == 4, 'value'] = 'short_' + df['value'].astype(str)

Upvotes: 4

jezrael
jezrael

Reputation: 862841

You need loc with *=:

df.loc[df.name.str.len() == 4, 'value'] *= 1000
print (df)
          name   value
0         Alex  1000.0
1         John  2000.0
2  Christopher     3.0
3       Dwayne     4.0

EDIT:

More general solutions:

mask = df.name.str.len() == 4
df.loc[mask, 'value'] = df.loc[mask, 'value'] * 1000

Or:

df.update(df.loc[mask, 'value'] * 1000)

Upvotes: 8

Related Questions