Python Pandas Dataframe: Fast way to clean and manipulate data?

Question

I have multiple time series dataframes where I keep having to do the same things such as: name the columns, drop columns, add columns, perform operations on columns, perform numpy.select operations on columns, remove columns(lately I have been using a second dataframe with the now unneeded columns).

Is there anyway I can create a function doing these things without me having to keep copying and pasting the code to get my data ready?

Slightly pseudocode example:

cleaning

cols = ['date','open','high','low','close','volume']
df = pd.read_csv('data.csv',sep='	',names=cols)
dcol=['volume']
df.drop(dcol,axis=1,inplace=True)

multiple of these

df.insert(loc=5,column='name1',value=(df['operation']-df['operation']))

second df (used for hiding the values from the main df)

df2 = df.copy()

again, mulitple of these

df2.insert(loc=6,column='name2',value=(df['operation']-df['operation']))

using numpy to select values from df2 to insert them into main df

import numpy as np
conditions = [(cond1),(cond2)]
values1 = [(value1),(value2)]
values2 = [(value1),(value2)]
values3 = [(value1),(value2)]

# and finally three of these
df['randomname'] = np.select(conditions,values1)

So, is there a fast way to do this? Or I just need to pull myself up by the bootstraps...

Python Pandas Dataframe: Fast way to clean and manipulate data?

cleaning

multiple of these

second df (used for hiding the values from the main df)

again, mulitple of these

using numpy to select values from df2 to insert them into main df

Answers (1)

standard for my csv files

insert columns and as many as you wish

using numpy.select in function

dropping columns in function

and finally, applying functions through pipe to the df

Related Questions