Reputation: 1217
Say I have a dataframe
with 7 columns. I'm only interested in columns A
and B
. Column B
contains numerical values.
What I want to do is select only columns A and B, after doing some mathematical operation f
on B. The sql
equivalent of what I'm saying is:
SELECT A, f(B)
FROM df;
I know that I can select just columns A and B by doing df[['A', 'B']]
. Also, I can just add another column f_B
saying: df['f_B'] = f(df['B'])
, and then select df[['A', 'f_B']]
.
However, is there a way of doing it without adding an extra column? What if when f
is as simple as a divide by 100
or something?
EDIT: I do not want to use pandasql
EDIT2: Sharing sample input and expected output:
Input:
A | B | C | D
--------------
a | 1 | c | d
b | 2 | c | d
c | 3 | c | d
d | 4 | c | d
Expected output (only columns A and B required), assuming f
is multiply by 2:
A | B
-----
a | 2
b | 4
c | 6
d | 8
Upvotes: 0
Views: 703
Reputation: 321
First you take only the columns you need:
df = df[['A', 'B']] # replace the original df with a smaller one
new_df = df[['A', 'B']] # or allocate a new space
You can simply do:
df.B = df.B / 10
Using lambda:
df.B = df.B.apply(lambda value: value / 10)
For more complicated cases:
def f(value):
# some logic
result = value ** 2
return result
df.B = df.B.apply(f)
Upvotes: 2