Svengineer91
Svengineer91

Reputation: 17

How to import many .txt files in a loop but without concatenation?

I have many (>40) data files in .txt (space-delimited) that have identical layout that I would like to read into python for data processing and plotting. The files are model outputs from a parametric sweep of one parameter, which occupies one column in each data file. The parameter increments to the next value in each consecutive file.

The issue I am having is that I do not know how to write the for-loop for reading each data file into its own dataframe.

I have seen many answers suggesting 'pandas.read_csv' followed by concatenation, however I do not want to concatenate the files into one dataframe, since I would like to plot each dataset separately. It doesn't make sense to me to resort to concatenating a dataframe only to have to then separate out the datasets afterwards.

import glob
import os
import pandas as pd
from pandas import Series, DataFrame

path = r'D:/user/data-folder/'

files = glob.glob(os.path.join(path + 'data-*.txt')) # Added based on suggestions from similar questions
df1 = []
for f in files:
    df = pd.read_csv(path1 + f,
         sep=' '
         )
    df1.append(df)

print(df1)

Ideally, I would like to have each data file read into its own dataframe, numbered incrementally, e.g. 'df1_1', 'df1_2', etc. I could then manipulate each dataframe individually and plot the data against each other for comparisons.

Upvotes: 1

Views: 72

Answers (2)

Trenton McKinney
Trenton McKinney

Reputation: 62373

Use pathlib to replace os & glob

from pathlib import Path

get the files

data_path = Path(r'D:/user/data-folder')
data_files = data_path.glob('data-*.txt')

store them in a dict

df_dict = dict()
for i, file in enumerate(data_files):
    df_dict[f'df_{i}'] = pd.read_csv(file, sep=' ')

recall a DataFrame

df_dict['df_1']

plot DataFrames

for value in df_dict.values():
    value.plot()

Upvotes: 1

Paul M.
Paul M.

Reputation: 10799

What about a list of dataframes? If you have:

../data/a.txt:

firstname,lastname,hobby
niles,crane,wine tasting
martin,crane,sitting in recliner
bob,bulldog,being annoying

../data/b.txt:

firstname,lastname,hobby
john,doe,doing stuff
jane,doe,being anonymous
humphrey,bogart,smoking and drinking

The code:

def main():

    from glob import glob
    from os.path import join
    import pandas as pd
    from pandas import DataFrame
    from contextlib import ExitStack

    local_path = "data/"

    filenames = glob(join(local_path + "*.txt"))

    with ExitStack() as context_manager:
        files = [context_manager.enter_context(open(filename, "r")) for filename in filenames]

        dataframes = []
        for file in files:
            dataframe = pd.read_csv(file)
            dataframes.append(dataframe)

        print(dataframes[0], end="\n\n")
        print(dataframes[1])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output:

  firstname lastname                hobby
0     niles    crane         wine tasting
1    martin    crane  sitting in recliner
2       bob  bulldog       being annoying

  firstname lastname                 hobby
0      john      doe           doing stuff
1      jane      doe       being anonymous
2  humphrey   bogart  smoking and drinking

Upvotes: 1

Related Questions