Reputation: 423
I have a dataframe as df, i want to split my activities into different functions so that i can use those functions into future programs
# check if dataframe has duplicates
def duplicate_check ():
global df
df = df.drop_duplicates(['datetime', 'tagname'])
df.drop(['tagname'], axis=1, inplace=True)
return df
df = duplicate_check()
# Split my dataframe array column to individual column
def array_split():
global df
date = df['datetime']
df = df['value'] \
.str.split('\t', expand=True).fillna('0') \
.replace(r'\s+|\\n', ' ', regex=True) \
.apply(pd.to_numeric)
df['datetime'] = date # Join date back to dataframe
return df
df = array_split()
# split dataframe df to df and df_spec
def remove_duplicate_spec():
global df, df_spec
df_spec = df.loc[df[123].isin([1])]
df = df.loc[df[123].isin([0])]
df_spec = df_spec.drop_duplicates(119)
return df, df_spec
df, df_spec = remove_duplicate_spec()
Question: Should i declare global df/ df_spec inside each function? Is this the best practice? or how can I optimize the code further
Upvotes: 0
Views: 51
Reputation: 4827
The best way is to use your dataframe as argument for each function.
df = pd.DataFrame({'datetime':[0,0,1,1,2], 'tagname':[0,0,1,1,2], 'other':range(95,100)})
def duplicate_check(df):
return df.drop_duplicates(['datetime', 'tagname'], keep='last').drop(['tagname'], axis=1)
duplicate_check(df)
DataFrame:
datetime tagname other
0 0 0 95
1 0 0 96
2 1 1 97
3 1 1 98
4 2 2 99
Result of duplicate_check(df)
:
datetime other
1 0 96
3 1 98
4 2 99
Upvotes: 2