Reputation: 8455
Take a random DataFrame:
df = pd.DataFrame(np.random.rand(3, 2), columns=['a', 'b'])
Pandas allows defining new columns in two ways:
df['c'] = df.a + df.b
df['c'] = df['a'] + df['b']
As the DataFrame name gets longer, this notation becomes less readable.
And then there's the query
function:
df.query('a > b')
It returns the slices of the df
that match the condition.
Is there a way to run something like DataFrame.query()
but for operations on the frame?
Upvotes: 1
Views: 335
Reputation: 8455
Function DataFrame.eval()
does exactly this:
df.eval('c = a + b')
And warning-free assignment:
df.eval('c = a + b', inplace=True)
More generally, pandas.eval()
:
The following arithmetic operations are supported: +, -, *, /, **, %, // (python engine only) along with the following boolean operations: | (or), & (and), and ~ (not). Additionally, the 'pandas' parser allows the use of and, or, and not with the same semantics as the corresponding bitwise operators.
Pandas docs say that eval
supports only Python expression statements (e.g., a == b
), but pandas silently supports abs(a - b)
and maybe other statements. The rest throw an error. For example:
df.eval('del(a)')
returns NotImplementedError: 'Delete' nodes are not implemented
.
Upvotes: 2
Reputation: 294478
Consider the dataframe named my_obnoxiously_long_dataframe_name
np.random.seed([3,1415])
my_obnoxiously_long_dataframe_name = pd.DataFrame(
np.random.randint(10, size=(10, 10)),
columns=list('ABCDEFGHIJ')
)
my_obnoxiously_long_dataframe_name
A B C D E F G H I J
0 0 2 7 3 8 7 0 6 8 6
1 0 2 0 4 9 7 3 2 4 3
2 3 6 7 7 4 5 3 7 5 9
3 8 7 6 4 7 6 2 6 6 5
4 2 8 7 5 8 4 7 6 1 5
5 2 8 2 4 7 6 9 4 2 4
6 6 3 8 3 9 8 0 4 3 0
7 4 1 5 8 6 0 8 7 4 6
8 3 5 8 5 1 5 1 4 3 9
9 5 5 7 0 3 2 5 8 8 9
If you want cleaner code, create a temp variable name that's smaller
d_ = my_obnoxiously_long_dataframe_name
d_['K'] = abs(d_.J - d_.D)
d_['L'] = d_.A + d_.B
del d_
my_obnoxiously_long_dataframe_name
A B C D E F G H I J K L
0 0 2 7 3 8 7 0 6 8 6 3 2
1 0 2 0 4 9 7 3 2 4 3 1 2
2 3 6 7 7 4 5 3 7 5 9 2 9
3 8 7 6 4 7 6 2 6 6 5 1 15
4 2 8 7 5 8 4 7 6 1 5 0 10
5 2 8 2 4 7 6 9 4 2 4 0 10
6 6 3 8 3 9 8 0 4 3 0 3 9
7 4 1 5 8 6 0 8 7 4 6 2 5
8 3 5 8 5 1 5 1 4 3 9 4 8
9 5 5 7 0 3 2 5 8 8 9 9 10
Upvotes: 1
Reputation: 153500
Here's a way using assign
and add
:
df.assign(c=df.a.add(df.b))
a b c
0 0.086468 0.978044 1.064512
1 0.270727 0.789762 1.060489
2 0.150097 0.662430 0.812527
Note: The assign creates a copy of your dataframe, therefore you aren't distorting the original data. You'll need to reassign to a different variable or back to df.
Upvotes: 1