spidermarn
spidermarn

Reputation: 939

Python: How can I read csv and clean my data in a loop

I have the following code below which reads in csv and cleans them for 5 different financial assets. The code seems repetitive and inefficient. Is there a way I can do this in a loop instead of having to declare the 5 assets variables one-by-one?

import pandas as pd

# Input symbols
stock_symbol = "VTI"
ltb_symbol = "TLT"
stb_symbol = "IEI"
gld_symbol = "GLD"
comms_symbol = "DBC"

session_type = "open" # open, high, low or close

# Read csv for all symbols in Dataframes
stock_df = pd.read_csv("BATS_{}, 1D.csv".format(stock_symbol))
ltb_df = pd.read_csv("BATS_{}, 1D.csv".format(ltb_symbol))
stb_df = pd.read_csv("BATS_{}, 1D.csv".format(stb_symbol))
gld_df = pd.read_csv("BATS_{}, 1D.csv".format(gld_symbol))
comms_df = pd.read_csv("BATS_{}, 1D.csv".format(comms_symbol))

#Clean Dataframes
stock_df = stock_df.rename({'{}'.format(session_type): '{}'.format(stock_symbol)}, axis=1)\
            .loc[:,['time','{}'.format(stock_symbol)]]
ltb_df = ltb_df.rename({'{}'.format(session_type): '{}'.format(ltb_symbol)}, axis=1)\
            .loc[:,['time','{}'.format(ltb_symbol)]]
stb_df = stb_df.rename({'{}'.format(session_type): '{}'.format(stb_symbol)}, axis=1)\
            .loc[:,['time','{}'.format(stb_symbol)]]
gld_df = gld_df.rename({'{}'.format(session_type): '{}'.format(gld_symbol)}, axis=1)\
            .loc[:,['time','{}'.format(gld_symbol)]]
comms_df = comms_df.rename({'{}'.format(session_type): '{}'.format(comms_symbol)}, axis=1)\
            .loc[:,['time','{}'.format(comms_symbol)]]

Upvotes: 2

Views: 67

Answers (1)

jezrael
jezrael

Reputation: 863611

Create dictionary of DataFrames:

symbols = [stock_symbol,ltb_symbol,stb_symbol,gld_symbol,comms_symbol]
session_type = "open" # open, high, low or close

# Read csv for all symbols in Dataframes

dfs = {s: pd.read_csv("BATS_{}, 1D.csv".format(s)).rename({'{}'.format(session_type): '{}'.format(stock_symbol)}, axis=1).loc[:,['time','{}'.format(stock_symbol)]] for s in symbols}

You can also filter only 2 columns in read_csv by usecols parameter:

dfs = {s: pd.read_csv("BATS_{}, 1D.csv".format(s), usecols=['time',session_type]).rename({session_type: stock_symbol}, axis=1) for s in symbols}

And last is possible for DataFrames select by keys:

print (dfs['VTI'])

Upvotes: 3

Related Questions