alwayshope430
alwayshope430

Reputation: 115

How to use Glob with Pandas to loop through a folder of CSVs?

I have a set of CSVs in a folder that I am trying to loop through for my pandas script. I am using glob to select the files ending in .csv but it just returns the same .csv file every time.

I am trying to accomplish the following:

  1. Use glob to select the folder containing .csvs and run the script on each individual .csv file in the folder
  2. Save the .csv filename as a variable that can be later applied to a .png file name

Basically, input the .csv file into the script, save the filename as a variable, run the rest of the script, and repeat until complete.

I am using Jupyter Notebook on MacOS

Here is my current code:

import yfinance as yf 
import matplotlib
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd 
import mplfinance as mpf 
import glob

path = r'/Users/chris/Desktop/Files'
files = glob.glob(path + "/*.csv")

for f in files:
    dfb = pd.read_csv(f,usecols=['Time','Balance'],index_col=0, parse_dates=True)

photoname = files+'.png'

dfb["Balance"] = dfb["Balance"].str.split(expand=True).iloc[:,0]  
dfb["Balance"] = dfb["Balance"].str.replace(',','').astype(float) 

df = yf.Ticker("DOGE-USD").history(period='max')
df = df.loc["2021-01-01":] 

newdfb = dfb['Balance'].resample('D').ohlc().dropna()  
newdfb.drop(['open','high','low'],axis=1,inplace=True) 
newdfb.columns = ['Balance']  

dates = [d.date() for d in newdfb.index]
newdfb.index = pd.DatetimeIndex(dates)
newdfb.index.name = 'Time'

dfc = df.join(newdfb, how='outer').dropna()
dfc.index.name = 'Date'

ap = mpf.make_addplot(dfc['Balance'])
mpf.plot(dfc,type='candle',addplot=ap)
print(address)

mpf.plot(dfc,type='candle',addplot=ap, savefig=photoname) #This saves as a photo

Upvotes: 0

Views: 4529

Answers (1)

alwayshope430
alwayshope430

Reputation: 115

The issue here was the lines following read_csv not being indented, thus not being in the for f in file: loop. After indenting the lines underneath read_csv, the code runs as it should.

Since the data in df = yf.Ticker("DOGE-USD").history(period='max') and df = df.loc["2021-01-01":] is static, moving it above the for loop is more efficient because this way it is only called once.

Here is the solution code:

import yfinance as yf 
import matplotlib
from matplotlib import pyplot as plt
import pandas as pd 
import mplfinance as mpf 
import glob

path = r'/Users/chris/Desktop/Files'
files = glob.glob(path + "/*.csv")

df = yf.Ticker("DOGE-USD").history(period='max')
df = df.loc["2021-01-01":] 

for f in files:
    dfb = pd.read_csv(f,usecols=['Time','Balance'],index_col=0, 
    parse_dates=True)

    photoname = files+'.png'

    dfb["Balance"] = dfb["Balance"].str.split(expand=True).iloc[:,0]  
    dfb["Balance"] = dfb["Balance"].str.replace(',','').astype(float) 


    newdfb = dfb['Balance'].resample('D').ohlc().dropna()  
    newdfb.drop(['open','high','low'],axis=1,inplace=True) 
    newdfb.columns = ['Balance']  

    dates = [d.date() for d in newdfb.index]
    newdfb.index = pd.DatetimeIndex(dates)
    newdfb.index.name = 'Time'

    dfc = df.join(newdfb, how='outer').dropna()
    dfc.index.name = 'Date'

    ap = mpf.make_addplot(dfc['Balance'])
    mpf.plot(dfc,type='candle',addplot=ap)
  

    mpf.plot(dfc,type='candle',addplot=ap, savefig=photoname) 

Thank you to @Nathan Mills and @Daniel Goldfarb for providing the solution in the original post comments.

Upvotes: 1

Related Questions