Reputation: 1967
I am using pandas to import some .dta file and numpy/sklearn to do some statistics on the set. I call the data sample
I do the following:
# import neccessary packages
import pandas as pd
import numpy as np
import sklearn as skl
# import data and give a little overview (col = var1-var5, 20 rows)
sample = pd.read_stata('sample_data.dta')
print('variables in dataset')
print(sample.dtypes)
print('first 5 rows and all cols')
print(sample[0:5])
# generate a new var
var6 = sample.var1/sample.var3
I get an error if I adress a variable by its name directly (var1
vs. sample.var1
). I find it a little tedious to always include sample.
. Is there any nice way to call the variables directly by their name?
Upvotes: 4
Views: 327
Reputation: 81654
See this contrived example. Usually I don't like messing with locals()
and globals()
but I don't see a cleaner way:
class A:
def __init__(self):
self.var1 = 1
self.var2 = 2
obj = A()
locals().update(obj.__dict__)
print(var1)
print(var2)
>> 1
2
Since you are working with a dataframe you will have to loop through df.columns
instead of __dict__
. Your code will be something along the lines of:
import pandas as pd
df = pd.DataFrame({'a':[1]})
for col in df.columns:
locals().update({col: df[col]})
print(a)
>> 0 1
Name: a, dtype: int64
You should be very careful when doing this , as this will overwrite any variable you may have already defined with the same name, eg:
import pandas as pd
a = 7
print(a)
>> 7
df = pd.DataFrame({'a':[1]})
for col in df.columns:
locals().update({col: df[col]})
print(a)
>> 0 1
Name: a, dtype: int64
Upvotes: 4