jtcloud
jtcloud

Reputation: 559

[Pandas]The way to assign a new column based on if statement

I know assign could help to create/change one column based on lambda function like this:

df.assign(c = lambda x: x.sum())

But I couldn't find a way to do this with if-statement if I want to make the statement inline instead of doing it separately outside of the operation.

Is it possible to realize this without doing anything else outside of the operation:

df.assign(c = lambda x: x.num_col.sum() if x.num_col > 0)

The above command returns "SyntaxError: invalid syntax"

Upvotes: 1

Views: 376

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210982

IIUC you can do it this way:

Data:

In [6]: df = pd.DataFrame(np.random.randn(10,2),columns=list('ab'))

In [7]: df
Out[7]:
          a         b
0  0.493970  1.095644
1  0.128510 -0.542144
2  0.136247 -0.544499
3 -0.540835 -0.100574
4  0.052725 -0.164856
5 -1.201619  1.578153
6  1.921872  0.505875
7 -2.519725  0.282050
8 -1.581868 -0.240352
9 -0.071207 -1.366953

In [8]: df.iloc[:6]
Out[8]:
          a         b
0  0.493970  1.095644
1  0.128510 -0.542144
2  0.136247 -0.544499
3 -0.540835 -0.100574
4  0.052725 -0.164856
5 -1.201619  1.578153
6  1.921872  0.505875

let's find a sum of positive values in a column for the indexes: [0:6]:

In [9]: df.iloc[:6].query('a > 0').a.sum()
Out[9]: 2.733322288547374

Solution:

In [10]: df.iloc[:6].assign(c=lambda x: x.query('a > 0').a.sum())
Out[10]:
          a         b         c
0  0.493970  1.095644  2.733322
1  0.128510 -0.542144  2.733322
2  0.136247 -0.544499  2.733322
3 -0.540835 -0.100574  2.733322
4  0.052725 -0.164856  2.733322
5 -1.201619  1.578153  2.733322
6  1.921872  0.505875  2.733322

the same with renamed columns:

In [11]: df.iloc[:6].rename(columns={'a':'AAA', 'b':'BBB'}).assign(c=lambda x: x.query('AAA > 0').AAA.sum())
Out[11]:
        AAA       BBB         c
0  0.493970  1.095644  2.733322
1  0.128510 -0.542144  2.733322
2  0.136247 -0.544499  2.733322
3 -0.540835 -0.100574  2.733322
4  0.052725 -0.164856  2.733322
5 -1.201619  1.578153  2.733322
6  1.921872  0.505875  2.733322

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Upvotes: 2

Michael Griffiths
Michael Griffiths

Reputation: 1427

The syntax is invalid because you're using the ternary condition, but only the first half.

The ternary condition allows you to write an if statement like this:

a = 1 if b > 0 else 0

In your case, you could write something like:

df = (
  df
  .assign(c = lambda x: x.num_col.sum() if x.num_col > 0 else 0)
)

Note the addition of the else 0 at the end.

Upvotes: -1

Related Questions