How to address data imported with pandas?

Question

I am using pandas to import some .dta file and numpy/sklearn to do some statistics on the set. I call the data sample I do the following:

#   import neccessary packages
import pandas as pd
import numpy as np
import sklearn as skl

#   import data and give a little overview (col = var1-var5, 20 rows)
sample = pd.read_stata('sample_data.dta')
print('variables in dataset')
print(sample.dtypes)
print('first 5 rows and all cols')
print(sample[0:5])

# generate a new var
var6 = sample.var1/sample.var3

I get an error if I adress a variable by its name directly (var1 vs. sample.var1). I find it a little tedious to always include sample.. Is there any nice way to call the variables directly by their name?

DeepSpace · Accepted Answer

See this contrived example. Usually I don't like messing with locals() and globals() but I don't see a cleaner way:

class A:
    def __init__(self):
        self.var1 = 1
        self.var2 = 2

obj = A()

locals().update(obj.__dict__)

print(var1)
print(var2)
>> 1
   2

Since you are working with a dataframe you will have to loop through df.columns instead of __dict__. Your code will be something along the lines of:

import pandas as pd

df = pd.DataFrame({'a':[1]})

for col in df.columns:
     locals().update({col: df[col]})

print(a)
>> 0    1
   Name: a, dtype: int64

You should be very careful when doing this , as this will overwrite any variable you may have already defined with the same name, eg:

import pandas as pd

a = 7

print(a)
>> 7

df = pd.DataFrame({'a':[1]})

for col in df.columns:
     locals().update({col: df[col]})

print(a)
>> 0    1
   Name: a, dtype: int64

How to address data imported with pandas?

Answers (1)

Related Questions