Reputation: 3577
I have about 50 text files that I would like to open and then perform a few operations on and then save the output to a new file. So for just 1 of these text files this code does what I want:
#open file
df=pd.read_csv(r'F:\Sheyenne\Statistics\NDVI_allotment\Text\A_Annex.txt', sep='\t', nrows=80, skiprows=2)
#replace value names in 'Basic Stats'
df=df.replace({'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}, regex=True)
#take a slice of the data
df['Basic Stats']=df['Basic Stats'].str.slice(13,20)
#sort the data
df=df.sort(columns='Basic Stats', axis=0, ascending=True)
I need to do these exact same operations on all 50 files, is there a way to do this in pandas? Even non-pandas answers will be helpful though.
Edit:
A snippet of what the first 1000 characters of the file is:
'Filename: F:\\Sheyenne\\Atmospherically Corrected Landsat\\Indices\\Main\\NDVI\\NDVI_stack\nROI: EVF: Layer: Main_allotments.shp (allotment1=A. Annex) [White] 3984 points\n\nBasic Stats\t Min\t Max\t Mean\t Stdev\t Num\tEigenvalue\n Band 1\t 0.428944\t0.843916\t0.689923\t0.052534\t 1\t 0.229509\n Band 2\t-0.000000\t0.689320\t0.513170\t0.048885\t 2\t 0.119217\n Band 3\t 0.336438\t0.743478\t0.592622\t0.052544\t 3\t 0.059111\n Band 4\t 0.313259\t0.678561\t0.525667\t0.048047\t 4\t 0.051338\n Band 5\t 0.374522\t0.746828\t0.583513\t0.055989\t 5\t 0.027913\n Band 6\t-0.000000\t0.749325\t0.330068\t0.314351\t 6\t 0.022561\n Band 7\t-0.000000\t0.819288\t0.600136\t0.170060\t 7\t 0.018126\n Band 8\t-0.000000\t0.687823\t0.450559\t0.084678\t 8\t 0.012942\n Band 9\t 0.332637\t0.776398\t0.549870\t0.085212\t 9\t 0.009261\n Band 10\t 0.386589\t0.848977\t0.635024\t0.087712\t 10\t 0.006628\n Band 11\t 0.265165\t0.822361\t0.594286\t0.075730\t 11\t 0.004517\n Band 12\t 0.191882\t0.539559\t0.343836\t0.0'
EDIT:
This code:
d={'Band 80$': 'LT50300281984137PAC00',
'Band 79$': 'LT50300281984185XXX15',
'Band 78$': 'LT50300821984249XXX03',
'Band 77$': 'LT50300281985139PAC12',
'Band 76$': 'LT50300281985171PAC04',
'Band 75$': 'LT50300281986206XXX03',
'Band 74$': 'LT50300281986238XXX03',
'Band 73$': 'LT50300281987241XXX04',
'Band 72$': 'LT50300281987257XXX03',
'Band 71$': 'LT50300281987273XXX05',
'Band 70$': 'LT50300281988212XXX03'}
pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new")
os.mkdir(new) # create new dir for new files
os.chdir(new) # change to that directory
# loop over each file and update
for f in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
df = df.replace(d)
df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
# save data to csv
df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")
print 'Done Processing'
returns:
IOError: Initializing from file failed
Upvotes: 2
Views: 864
Reputation: 180411
d = {'Basic Stats':{'Band 80$': 'LT50300281984137PAC00',
'Band 79': 'LT50300281984185XXX15',
'Band 78': 'LT50300821984249XXX03',
'Band 77': 'LT50300281985139PAC12',
'Band 76': 'LT50300281985171PAC04',
'Band 75': 'LT50300281986206XXX03',
'Band 74': 'LT50300281986238XXX03',
'Band 73': 'LT50300281987241XXX04',
'Band 71': 'LT50300281987273XXX05',
'Band 70': 'LT50300281988212XXX03'}}
pth = r'F:\Sheyenne\Statistics\NDVI_allotment\Text' # path to files
new = os.path.join(pth,"new")
os.mkdir(new) # create new dir for new files
# loop over each file and update
for f in os.listdir(pth):
df = pd.read_csv(os.path.join(pth, f), sep='\t', nrows=80, skiprows=2)
df = df.replace(d)
df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
df.sort(columns='Basic Stats', axis=0, ascending=True, inplace=True)
# save data to csv
df.to_csv(os.path.join(new, "new_{}".format(f)), index=False, sep="\t")
One part that does not make sense is replacing with the values from the dict and then slicing some of the string away, it would make more sense to use the correct values to start with. Another issue is if df['Basic Stats'] = df['Basic Stats'].str.slice(13,20)
matches nothing then slicing from 13:20 will leave you with an empty string so you should make sure that there will definitely be a match for each row or you will end up losing data
Upvotes: 1
Reputation: 325
I'd wrap what you have in a function, and make the filename a parameter to the function. Then you can just call the function in a loop to process each file. This isn't panda-specific, but it should work.
If all the files to be processed are in one directory, you can use this answer to get a list of the files.
from os import listdir
from os.path import isfile, join
mypath = 'the directory name here'
filenames = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]
def process_file(filename):
df=pd.read_csv(filename, sep='\t', nrows=80, skiprows=2)
# Rest of code goes here...
for filename in filenames:
process_file(filename)
Upvotes: 1