Louise Stevens
Louise Stevens

Reputation: 85

Iterating through files in a folder

I am using pandas to read and analyse large NHS prescription datasets. They are individually saved as csv files in a folder. I want to apply the same couple of lines of code to each of these files individually - so I want to individually read each dataframe with pandas and then extract relevant data and perform a basic calculation.

This is my code so far:

import pandas as pd
import glob

path = "/Volumes/TOSHIBA EXT/Datasets/2015"
all_files = glob.glob(path + "/*.CSV")

for f in all_files:
    pd.read_csv(f,index_col=None, header=0, usecols=[2,4,5], names=['PRACTICE','BNF NAME', 'ITEMS'])
    f=f[f['BNF NAME'].str.contains('Ampicillin' and 'Amoxicillin' and 'Co-Amoxiclav')]
    print pd.to_numeric(f['ITEMS']).sum()

However the following error is coming up..

TypeError: string indices must be integers, not str 

I would ideally like to define a function that selects the relevant rows containing the strings: "Ampicillin", "Amoxicillin" and "Co-Amoxiclav"; and then totals the number of items for each of these (i.e totalling the column named 'ITEMS') which I could then use in a for loop to iterate through each file.

Would appreciate any pointers as to how to avoid this error and achieve the above.

Many thanks! :)

Upvotes: 1

Views: 2262

Answers (2)

Kiv
Kiv

Reputation: 32728

There are two issues in your code:

1) The return value of pd.read_csv isn't stored in a variable. That's why you're getting a TypeError - you're trying to operate on f (the filename) as if it were a DataFrame.

2) The filter doesn't work because the expression evaluates from the inside out. To solve this you can create a list of options and then use isin to test if the target is in the list like so:

df["BNF Name"].isin(['Ampicillin', 'Amoxicillin', 'Co-Amoxiclav'])

Upvotes: 1

furas
furas

Reputation: 142651

f is filename but you use it as DataFrame - f['BNF NAME'].

You need

df = pd.read_csv(...)

and then you can use

df['BNF NAME']

df = df[ df['BNF NAME'] ... ]

df['ITEMS']

Upvotes: 2

Related Questions