Reputation: 559
I know assign could help to create/change one column based on lambda function like this:
df.assign(c = lambda x: x.sum())
But I couldn't find a way to do this with if-statement if I want to make the statement inline instead of doing it separately outside of the operation.
Is it possible to realize this without doing anything else outside of the operation:
df.assign(c = lambda x: x.num_col.sum() if x.num_col > 0)
The above command returns "SyntaxError: invalid syntax"
Upvotes: 1
Views: 376
Reputation: 210982
IIUC you can do it this way:
Data:
In [6]: df = pd.DataFrame(np.random.randn(10,2),columns=list('ab'))
In [7]: df
Out[7]:
a b
0 0.493970 1.095644
1 0.128510 -0.542144
2 0.136247 -0.544499
3 -0.540835 -0.100574
4 0.052725 -0.164856
5 -1.201619 1.578153
6 1.921872 0.505875
7 -2.519725 0.282050
8 -1.581868 -0.240352
9 -0.071207 -1.366953
In [8]: df.iloc[:6]
Out[8]:
a b
0 0.493970 1.095644
1 0.128510 -0.542144
2 0.136247 -0.544499
3 -0.540835 -0.100574
4 0.052725 -0.164856
5 -1.201619 1.578153
6 1.921872 0.505875
let's find a sum of positive values in a
column for the indexes: [0:6]
:
In [9]: df.iloc[:6].query('a > 0').a.sum()
Out[9]: 2.733322288547374
Solution:
In [10]: df.iloc[:6].assign(c=lambda x: x.query('a > 0').a.sum())
Out[10]:
a b c
0 0.493970 1.095644 2.733322
1 0.128510 -0.542144 2.733322
2 0.136247 -0.544499 2.733322
3 -0.540835 -0.100574 2.733322
4 0.052725 -0.164856 2.733322
5 -1.201619 1.578153 2.733322
6 1.921872 0.505875 2.733322
the same with renamed columns:
In [11]: df.iloc[:6].rename(columns={'a':'AAA', 'b':'BBB'}).assign(c=lambda x: x.query('AAA > 0').AAA.sum())
Out[11]:
AAA BBB c
0 0.493970 1.095644 2.733322
1 0.128510 -0.542144 2.733322
2 0.136247 -0.544499 2.733322
3 -0.540835 -0.100574 2.733322
4 0.052725 -0.164856 2.733322
5 -1.201619 1.578153 2.733322
6 1.921872 0.505875 2.733322
UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.
Upvotes: 2
Reputation: 1427
The syntax is invalid because you're using the ternary condition, but only the first half.
The ternary condition allows you to write an if
statement like this:
a = 1 if b > 0 else 0
In your case, you could write something like:
df = (
df
.assign(c = lambda x: x.num_col.sum() if x.num_col > 0 else 0)
)
Note the addition of the else 0
at the end.
Upvotes: -1