ghost
ghost

Reputation: 1217

How to select multiple columns in pandas dataframe after doing some operation on one column?

Say I have a dataframe with 7 columns. I'm only interested in columns A and B. Column B contains numerical values.

What I want to do is select only columns A and B, after doing some mathematical operation f on B. The sql equivalent of what I'm saying is:

SELECT A, f(B)
FROM df;

I know that I can select just columns A and B by doing df[['A', 'B']]. Also, I can just add another column f_B saying: df['f_B'] = f(df['B']), and then select df[['A', 'f_B']].

However, is there a way of doing it without adding an extra column? What if when f is as simple as a divide by 100 or something?

EDIT: I do not want to use pandasql

EDIT2: Sharing sample input and expected output:

Input:

A | B | C | D
--------------
a | 1 | c | d
b | 2 | c | d
c | 3 | c | d
d | 4 | c | d

Expected output (only columns A and B required), assuming f is multiply by 2:

A | B
-----
a | 2
b | 4
c | 6
d | 8

Upvotes: 0

Views: 703

Answers (1)

Andrei R.
Andrei R.

Reputation: 321

First you take only the columns you need:

df = df[['A', 'B']]  # replace the original df with a smaller one
new_df = df[['A', 'B']] # or allocate a new space
  1. You can simply do:

    df.B = df.B / 10
    
  2. Using lambda:

    df.B = df.B.apply(lambda value: value / 10)
    
  3. For more complicated cases:

    def f(value):
      # some logic
      result = value ** 2
      return result
    
    df.B = df.B.apply(f)
    

Upvotes: 2

Related Questions