newpyguy
newpyguy

Reputation: 31

Read Multiple txt files to multiple dataframes and concate later all dataframes to one

I could only find the topics reading multiple txt files to one single dataframe. But I want to store them each as a different dataframe ( df1, df2, ... ) and later concate them together to one dataframe. Is there a fast way to do this ? Better what is the fastest way to do this ? That's one big point for me. The data names should not be used, they have the format (year.month.day.hour.minute.second) no txt in the end of the files to find. Thank you in advance. Right now I am just reading and putting in one file:

f in glob.glob("path_in_dir"):
    df = pd.read_table(f, delim_whitespace=True, 
               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                      'D': np.float32,'E': np.float32, 'F': np.float32,
                      'G': np.float32,'H': np.float32})

    all_data = all_data.append(df,ignore_index=True)

Upvotes: 0

Views: 1111

Answers (3)

Gaurav Singhal
Gaurav Singhal

Reputation: 148

I didn't use the exact data structure instead I created few dummy files to perform the use case.

import pandas as pd
import glob

datasets = []
for f in glob.glob("<Path to folder>"):
    df = pd.read_csv(f, sep=',', names=('Col1', 'Col2', 'Col3', 'Col4'), dtype={'Col1':str, 'Col2':int, 'Col3':float, 'Col4':str})
    datasets.append(df)
all_data = pd.concat(datasets, ignore_index=True)
print(all_data.head())

You can manipulate this code to make your code working.

Thanks

Upvotes: 0

Parfait
Parfait

Reputation: 107567

Reconsider this approach: I want to store them each as a different dataframe (df1,df2...) and later concatenate them. Instead, save each similar dataframe in a larger container like list or dictionary. This avoids flooding your global environment with many (potentially hundreds) of separate objects.

Below you have only two objects to maintain: 1) df_dict, with keys being df1, df2, ... and 2) all_data, where all dataframe elements are stacked together.

df_dict = {}

for i, f in enumerate(glob.glob("path_in_dir")):
    df_dict['df'+str(i+1)] = pd.read_table(f, delim_whitespace=True, 
                               names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                               dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
                                      'D': np.float32,'E': np.float32, 'F': np.float32,
                                      'G': np.float32,'H': np.float32})
# MASTER COMPILED DATAFRAME
all_data = pd.concat(df_dict.values(), ignore_index=True)

# FIRST THREE DATAFRAMES
df_dict['df1'] = ...
df_dict['df2'] = ...
df_dict['df3'] = ...

Upvotes: 1

kjmerf
kjmerf

Reputation: 4335

You could try something like:

import pandas as pd

df = pd.read_csv(r'your_file.txt', sep = '\t')
df2 = pd.read_csv(r'your_second_file.txt', sep = '\t')
df3 = pd.read_csv(r'your_third_file.txt', sep = '\t')

master = pd.concat([df, df2, df3])

Upvotes: 0

Related Questions