Reputation: 17
I have many (>40) data files in .txt (space-delimited) that have identical layout that I would like to read into python for data processing and plotting. The files are model outputs from a parametric sweep of one parameter, which occupies one column in each data file. The parameter increments to the next value in each consecutive file.
The issue I am having is that I do not know how to write the for-loop for reading each data file into its own dataframe.
I have seen many answers suggesting 'pandas.read_csv' followed by concatenation, however I do not want to concatenate the files into one dataframe, since I would like to plot each dataset separately. It doesn't make sense to me to resort to concatenating a dataframe only to have to then separate out the datasets afterwards.
import glob
import os
import pandas as pd
from pandas import Series, DataFrame
path = r'D:/user/data-folder/'
files = glob.glob(os.path.join(path + 'data-*.txt')) # Added based on suggestions from similar questions
df1 = []
for f in files:
df = pd.read_csv(path1 + f,
sep=' '
)
df1.append(df)
print(df1)
Ideally, I would like to have each data file read into its own dataframe, numbered incrementally, e.g. 'df1_1', 'df1_2', etc. I could then manipulate each dataframe individually and plot the data against each other for comparisons.
Upvotes: 1
Views: 72
Reputation: 62373
pathlib
to replace os
& glob
from pathlib import Path
data_path = Path(r'D:/user/data-folder')
data_files = data_path.glob('data-*.txt')
dict
df_dict = dict()
for i, file in enumerate(data_files):
df_dict[f'df_{i}'] = pd.read_csv(file, sep=' ')
DataFrame
df_dict['df_1']
DataFrames
for value in df_dict.values():
value.plot()
Upvotes: 1
Reputation: 10799
What about a list of dataframes? If you have:
../data/a.txt:
firstname,lastname,hobby
niles,crane,wine tasting
martin,crane,sitting in recliner
bob,bulldog,being annoying
../data/b.txt:
firstname,lastname,hobby
john,doe,doing stuff
jane,doe,being anonymous
humphrey,bogart,smoking and drinking
The code:
def main():
from glob import glob
from os.path import join
import pandas as pd
from pandas import DataFrame
from contextlib import ExitStack
local_path = "data/"
filenames = glob(join(local_path + "*.txt"))
with ExitStack() as context_manager:
files = [context_manager.enter_context(open(filename, "r")) for filename in filenames]
dataframes = []
for file in files:
dataframe = pd.read_csv(file)
dataframes.append(dataframe)
print(dataframes[0], end="\n\n")
print(dataframes[1])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
firstname lastname hobby
0 niles crane wine tasting
1 martin crane sitting in recliner
2 bob bulldog being annoying
firstname lastname hobby
0 john doe doing stuff
1 jane doe being anonymous
2 humphrey bogart smoking and drinking
Upvotes: 1