Opening and performing the same operations on multiple txt files

Question

I have about 50 text files that I would like to open and then perform a few operations on and then save the output to a new file. So for just 1 of these text files this code does what I want:

#open file
df=pd.read_csv(r'F:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex.txt', sep='	', nrows=80, skiprows=2)

#replace value names in 'Basic Stats'
df=df.replace({'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}, regex=True)

#take a slice of the data
df['Basic Stats']=df['Basic Stats'].str.slice(13,20)

#sort the data
df=df.sort(columns='Basic Stats', axis=0, ascending=True)

I need to do these exact same operations on all 50 files, is there a way to do this in pandas? Even non-pandas answers will be helpful though.

Edit:

A snippet of what the first 1000 characters of the file is:

'Filename: F:\Sheyenne\Atmospherically Corrected Landsat\Indices\Main\NDVI\NDVI_stack
ROI: EVF: Layer: Main_allotments.shp (allotment1=A. Annex) [White] 3984 points

Basic Stats	      Min	     Max	    Mean	   Stdev	  Num	Eigenvalue
     Band 1	 0.428944	0.843916	0.689923	0.052534	    1	  0.229509
     Band 2	-0.000000	0.689320	0.513170	0.048885	    2	  0.119217
     Band 3	 0.336438	0.743478	0.592622	0.052544	    3	  0.059111
     Band 4	 0.313259	0.678561	0.525667	0.048047	    4	  0.051338
     Band 5	 0.374522	0.746828	0.583513	0.055989	    5	  0.027913
     Band 6	-0.000000	0.749325	0.330068	0.314351	    6	  0.022561
     Band 7	-0.000000	0.819288	0.600136	0.170060	    7	  0.018126
     Band 8	-0.000000	0.687823	0.450559	0.084678	    8	  0.012942
     Band 9	 0.332637	0.776398	0.549870	0.085212	    9	  0.009261
    Band 10	 0.386589	0.848977	0.635024	0.087712	   10	  0.006628
    Band 11	 0.265165	0.822361	0.594286	0.075730	   11	  0.004517
    Band 12	 0.191882	0.539559	0.343836	0.0'

EDIT:

This code:

d={'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}

pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
os.chdir(new) # change to that directory
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='	', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="	")
print 'Done Processing'

returns:

IOError: Initializing from file failed

Padraic Cunningham · Accepted Answer

 d = {'Basic Stats':{'Band 80$': 'LT50300281984137PAC00',
 'Band 79': 'LT50300281984185XXX15',
 'Band 78': 'LT50300821984249XXX03',
 'Band 77': 'LT50300281985139PAC12',
 'Band 76': 'LT50300281985171PAC04',
 'Band 75': 'LT50300281986206XXX03',
 'Band 74': 'LT50300281986238XXX03',
 'Band 73': 'LT50300281987241XXX04',
 'Band 71': 'LT50300281987273XXX05',
 'Band 70': 'LT50300281988212XXX03'}}


pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new") 
os.mkdir(new)  # create new dir for new files
# loop over each file and update
for f in os.listdir(pth):
    df = pd.read_csv(os.path.join(pth, f), sep='	', nrows=80, skiprows=2)
    df = df.replace(d)
    df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
    df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
    # save data to csv
    df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="	")

One part that does not make sense is replacing with the values from the dict and then slicing some of the string away, it would make more sense to use the correct values to start with. Another issue is if df['Basic Stats'] = df['Basic Stats'].str.slice(13,20) matches nothing then slicing from 13:20 will leave you with an empty string so you should make sure that there will definitely be a match for each row or you will end up losing data

Opening and performing the same operations on multiple txt files

Answers (2)

Related Questions