Reputation: 438
Suppose I have the following data sets that I successfully read into my notebook:
data1-format1-1991-1992.csv
data2-format1-1993-1994.csv
data3-format1-1995-1996.csv
data4-format2-1991-1992.csv
data5-format2-1993-1994.csv
data6-format2-1995-1996.csv
I can easily read these sets in, from their own directories like so:
files1 = [filename for filename in os.listdir(master_dir_1) if filename.startswith("data-")] # would do same for second set of files, in master_dir_2
for filename in files1:
df = pd.read_csv(filename)
print(df)
This prints them out in one frame - but I need to read them in, and then pd.merge()
each separate file based on the year in the file name. E.g., data1-format1-1991-1992.csv
gets joined with data1-format2-1991-1992.csv
, and so on.
The process would be to read in both sets from their own directories with the loop above, and then have another loop that combines and saves based on the matching years in the file names (I think). How do I split based on the years, and save each joined file separately, with the years in the saved csv name? I've got the process for reading in single data sets and joining/saving, but not for multiple read ins, joins, and saves.
This question gets me some of the way there, because of how .format()
is utilized.
Upvotes: 0
Views: 44
Reputation: 31
years = [name.split('-')[-2:] for name in names]
years = ['-'.join(y) for y in years]
years = set(years) #unique years
print(years)
dataformat = [name.split('-')[:2] for name in names]
dataformat = ['-'.join(df) for df in dataformat]
print(dataformat)
year_data = dict()
for year in years:
yragne=year.split('.')[0]
year_data[yragne] = pd.DataFrame([])
for df in dataformat:
try:
gg = pd.read_csv(rootdir+'/'+df+'-'+year,header = None)
print(gg)
year_data[yragne] = year_data[yragne].append(pd.read_csv(rootdir+'/'+df+'-'+year,header = None),True)
except:
pass
Upvotes: 1
Reputation: 16683
You can try a for loop with a .groupby
for year, x in df.groupby(['Year']):
x.to_csv(f'data1-format1-{year}.csv', index=False)
Upvotes: 1