Reputation: 31
I could only find the topics reading multiple txt files to one single dataframe. But I want to store them each as a different dataframe ( df1, df2, ... )
and later concate them together to one dataframe. Is there a fast way to do this ? Better what is the fastest way to do this ? That's one big point for me. The data names should not be used, they have the format (year.month.day.hour.minute.second)
no txt in the end of the files to find. Thank you in advance.
Right now I am just reading and putting in one file:
f in glob.glob("path_in_dir"):
df = pd.read_table(f, delim_whitespace=True,
names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
'D': np.float32,'E': np.float32, 'F': np.float32,
'G': np.float32,'H': np.float32})
all_data = all_data.append(df,ignore_index=True)
Upvotes: 0
Views: 1111
Reputation: 148
I didn't use the exact data structure instead I created few dummy files to perform the use case.
import pandas as pd
import glob
datasets = []
for f in glob.glob("<Path to folder>"):
df = pd.read_csv(f, sep=',', names=('Col1', 'Col2', 'Col3', 'Col4'), dtype={'Col1':str, 'Col2':int, 'Col3':float, 'Col4':str})
datasets.append(df)
all_data = pd.concat(datasets, ignore_index=True)
print(all_data.head())
You can manipulate this code to make your code working.
Thanks
Upvotes: 0
Reputation: 107567
Reconsider this approach: I want to store them each as a different dataframe (df1,df2...) and later concatenate them. Instead, save each similar dataframe in a larger container like list or dictionary. This avoids flooding your global environment with many (potentially hundreds) of separate objects.
Below you have only two objects to maintain: 1) df_dict, with keys being df1, df2, ... and 2) all_data, where all dataframe elements are stacked together.
df_dict = {}
for i, f in enumerate(glob.glob("path_in_dir")):
df_dict['df'+str(i+1)] = pd.read_table(f, delim_whitespace=True,
names=('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
dtype={'A': np.float32, 'B': np.float32, 'C': np.float32,
'D': np.float32,'E': np.float32, 'F': np.float32,
'G': np.float32,'H': np.float32})
# MASTER COMPILED DATAFRAME
all_data = pd.concat(df_dict.values(), ignore_index=True)
# FIRST THREE DATAFRAMES
df_dict['df1'] = ...
df_dict['df2'] = ...
df_dict['df3'] = ...
Upvotes: 1
Reputation: 4335
You could try something like:
import pandas as pd
df = pd.read_csv(r'your_file.txt', sep = '\t')
df2 = pd.read_csv(r'your_second_file.txt', sep = '\t')
df3 = pd.read_csv(r'your_third_file.txt', sep = '\t')
master = pd.concat([df, df2, df3])
Upvotes: 0