Reputation: 327
I have a list of files stored in directory such as
filenames=[
abc_1.txt
abc_2.txt
abc_3.txt
bcd_1.txt
bcd_2.txt
bcd_3.txt
]
pattern=[abc]
I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.
My code:
filenames = os.listdir(file_path)
expnames=[]
for files in filenames:
expnames.append(files.rsplit('_',1)[0])
## expnames=[abc, bcd]
dfs = []
for exp in expnames:
for files in filenames:
if files.startswith(exp):
dfs.append(pd.read_csv(file_path+files,sep=',',header=None))
big_frame = pd.concat(dfs, ignore_index=True)
My output contain duplicate rows due to multiple for loops
Can someone help wih this?
Upvotes: 0
Views: 1198
Reputation: 327
file_path = '/home/iolie/Downloads/test/'
filenames = os.listdir(file_path)
prefixes = list(set(i.split('_')[0] for i in filenames))
for prefix in prefixes:
for file in filenames:
if file.startswith(prefix):
list_of_dfs= [ pd.concat( [pd.read_csv (os.path.join(file_path, file)], header=None ),ignore_index=True)]
final=pd.concat(list_of_dfs)
Upvotes: 0
Reputation: 9019
This will store your desired outputs in a list of dataframes called list_of_dfs
and then create a MultiIndex dataframe final
from them with the file prefixes (e.g. ['abc','bcd']
) as the keys for the outermost index level:
import pandas as pd
import os
filenames = os.listdir(file_path)
prefixes = list(set(i.split('_')[0] for i in filenames))
list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) for file in filenames if file.startswith(prefix)], ignore_index=True) for prefix in prefixes]
final = pd.concat(list_of_dfs, keys=prefixes)
Upvotes: 1