Florian
Florian

Reputation: 35

How can I access certain columns in a DataFrame based on a list?

I have created a DataFrame:

import pandas as pd
import random
data = [[random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)]]
df= pd.DataFrame(data, columns=["A","B", "C", "D", "E"])

The DataFrame looks like this:

          A         B         C         D         E
0  0.736739  0.184075  0.727951  0.173798  0.184594
1  0.047031  0.567518  0.103112  0.094116  0.050785
2  0.955045  0.754968  0.235842  0.710304  0.109404
3  0.426293  0.617942  0.304042  0.043034  0.798327
4  0.415225  0.461497  0.263462  0.621364  0.974682
5  0.936775  0.822425  0.073169  0.634906  0.140092

What I want to do now is to divide certain columns for example with 2. In this case I'd like to divide column B, D, and E with two. For that I'd like to use a list. Becaus in my real Problem the Indices are Equity-names and the ones I'd like to divide are saved in a list.

That's what I tried:

list = ["B", "D", "E"]
df1 = df[df.columns.intersection(list)] *0.5
print(df1)

And the result looks like this:

          B         D         E
0  0.092038  0.086899  0.092297
1  0.283759  0.047058  0.025392
2  0.377484  0.355152  0.054702
3  0.308971  0.021517  0.399164
4  0.230749  0.310682  0.487341
5  0.411212  0.317453  0.070046

But what I get is only a DataFrame which only contains the columns from the list. I'd like that my result would contain the new calculated value plus the old values from the columns which weren't in the list.

The result should look like this:

          A         B         C         D         E
0  0.736739  0.092038  0.727951  0.086899  0.092297
1  0.047031  0.283759  0.103112  0.047058  0.025392
2  0.955045  0.377484  0.235842  0.355152  0.054702
3  0.426293  0.308971  0.304042  0.021517  0.399164
4  0.415225  0.230749  0.263462  0.310682  0.487341
5  0.936775  0.411212  0.073169  0.317453  0.070046

Does anyone know's how I can solve this problem? Your help is very appreciated.

Best regards!

Upvotes: 2

Views: 63

Answers (5)

Ian
Ian

Reputation: 3898

Try this:

df[['B','D','E']] = df[['B','D','E']]*0.5

@metasomite pointed out in a proposed edit a simplification using *=

df.loc[:, ['B', 'D', 'E']] *= 0.5, 

df now looks like this:

    A           B           C           D           E
0   0.736739    0.092037    0.727951    0.086899    0.092297
1   0.047031    0.283759    0.103112    0.047058    0.025392
2   0.955045    0.377484    0.235842    0.355152    0.054702
3   0.426293    0.308971    0.304042    0.021517    0.399164
4   0.415225    0.230748    0.263462    0.310682    0.487341
5   0.936775    0.411212    0.073169    0.317453    0.070046

Upvotes: 5

ALollz
ALollz

Reputation: 59549

DataFrame.mul(Series)

# Can generalize to different numbers for each column.  
s = pd.Series(0.5, index=['B', 'D', 'E'])

# `reindex` as DataFrame.mul(Series) hasn't implemented `fill_value`
df.mul(s.reindex(df.columns).fillna(1))
 

          A         B         C         D         E
0  0.736739  0.092037  0.727951  0.086899  0.092297
1  0.047031  0.283759  0.103112  0.047058  0.025393
2  0.955045  0.377484  0.235842  0.355152  0.054702
3  0.426293  0.308971  0.304042  0.021517  0.399164
4  0.415225  0.230749  0.263462  0.310682  0.487341
5  0.936775  0.411212  0.073169  0.317453  0.070046

Upvotes: 1

Dev Khadka
Dev Khadka

Reputation: 5451

you can use assign function like below

import pandas as pd
import random
data = [[random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)]]
df= pd.DataFrame(data, columns=["A","B", "C", "D", "E"])

lst = ["B", "D", "E"]

df.assign(**{col:df[col]*0.5 for col in lst})

Upvotes: 1

ansev
ansev

Reputation: 30920

first create a copy of the original dataframe to not modify this:

df1=df.copy()

Then You can use DataFrame.mul or *:

df1[['B','D','E']] = df1[['B','D','E']].mul(0.5)

Also DataFrame.div or /

df1[['B','D','E']] = df1[['B','D','E']].div(2)

Upvotes: 1

Celius Stingher
Celius Stingher

Reputation: 18367

I like to solve this in a for loop, iterating through the list that contain the columns name. You can also use it to add them with a new name (Method 2):

data = [[random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1), random.uniform(0, 1)], [random.uniform(0, 1), random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1),random.uniform(0, 1)]]
df= pd.DataFrame(data, columns=["A","B", "C", "D", "E"])
cols = ["B","D","E"]
print(df)
for i in cols:
    df[i] = df[i] * 0.5
print(df)

Original dataframe (first print):

          A         B         C         D         E
0  0.245053  0.265646  0.379884  0.125120  0.244205
1  0.900575  0.340778  0.470371  0.201992  0.316867
2  0.286503  0.524801  0.904360  0.214806  0.841444
3  0.535986  0.345602  0.863335  0.607875  0.493185
4  0.950249  0.462833  0.419088  0.948236  0.476547
5  0.162888  0.672005  0.554368  0.494376  0.913913

Second dataframe (second print):

          A         B         C         D         E
0  0.245053  0.132823  0.379884  0.062560  0.122103
1  0.900575  0.170389  0.470371  0.100996  0.158434
2  0.286503  0.262400  0.904360  0.107403  0.420722
3  0.535986  0.172801  0.863335  0.303937  0.246592
4  0.950249  0.231416  0.419088  0.474118  0.238273
5  0.162888  0.336002  0.554368  0.247188  0.456957

Method 2:

for i in cols:
    df["new "+i] = df[i] * 0.5
print(df)

Output:

          A         B         C         D         E     new B     new D     new E
0  0.735067  0.213327  0.416205  0.235860  0.094208  0.106664  0.117930  0.047104
1  0.150027  0.524437  0.393283  0.783323  0.520855  0.262218  0.391661  0.260428
2  0.146858  0.328530  0.288445  0.101783  0.286224  0.164265  0.050892  0.143112
3  0.512124  0.302685  0.062246  0.152522  0.536951  0.151343  0.076261  0.268476
4  0.358646  0.928946  0.766012  0.808933  0.002960  0.464473  0.404466  0.001480
5  0.735067  0.436962  0.796247  0.499950  0.048898  0.218481  0.249975  0.024449

Upvotes: 0

Related Questions