Reputation: 49
I have upwards of 4000 lines of code that analyze, manipulate, compare and plot 2 huge .csv
documents. For readability and future publication, I'd like to convert to object-oriented classes. I convert them to pd.DataFrames
:
my_data1 = pd.DataFrame(np.random.randn(100, 9), columns=list('123456789'))
my_data2 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
I have functions that compare various aspects of each of the datasets and functions that only use the datasets individually. I want to convert this structure into a dataclass with methods for each dataframe.
I can't manipulate these dataframes through my class functions. I keep getting NameError: name 'self' is not defined
. Here's my dataclass structure:
@dataclass
class Data:
ser = pd.DataFrame
# def __post_init__(self):
# self.ser = self.clean()
def clean(self, ser):
acceptcols = np.where(ser.loc[0, :] == '2')[0]
data = ser.iloc[:, np.insert(acceptcols, 0, 0)]
data = ser.drop(0)
data = ser.rename(columns={'': 'Time(s)'})
data = ser.astype(float)
data = ser.reset_index(drop=True)
data.columns = [column.replace('1', '')
for column in ser.columns]
return data
my_data1 = pd.DataFrame(np.random.randn(100, 9), columns=list('123456789'))
my_data2 = pd.DataFrame(np.random.randn(100, 4), columns=list('ABCD'))
# Attempt 1
new_data1 = Data.clean(my_data1) # Parameter "ser" unfilled
# Attempt 2
new_data1 = Data.clean(ser=my_data1) # Parameter "self" unfilled
# Attempt 3
new_data1 = Data.clean(self, my_data1) # Unresolved reference "self"
I have tried various forms of defining def clean(self and other stuff)
but I think I just don't understand classes or class structure enough. Documentation on classes and dataclasses always use very rudimentary examples, I've tried cut/pasting a template to no avail. What am I missing?
Upvotes: 1
Views: 1261
Reputation: 2541
you can first get an instance x
of the class Data
.
x = Data()
# Attempt 1
new_data1 = x.clean(my_data1) # Parameter "ser" unfilled
# Attempt 2
new_data1 = x.clean(ser=my_data1) # Parameter "self" unfilled
If I were you I would not use a class this way, I would instead just define the following function
def clean(ser):
acceptcols = np.where(ser.loc[0, :] == '2')[0]
data = ser.iloc[:, np.insert(acceptcols, 0, 0)]
data = ser.drop(0)
data = ser.rename(columns={'': 'Time(s)'})
data = ser.astype(float)
data = ser.reset_index(drop=True)
data.columns = [column.replace('1', '')
for column in ser.columns]
return data
and call it directly.
Also, in your clean()
, each modification is based on ser
which is the input, but not the last modification. This is a problem, isn't this?
Upvotes: 2