Merge files with similar name in to one dataframe

Question

I have a list of files stored in directory such as

filenames=[
        abc_1.txt
        abc_2.txt
        abc_3.txt

        bcd_1.txt
        bcd_2.txt
        bcd_3.txt
       ]

pattern=[abc]

I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.

My code:

filenames = os.listdir(file_path)
expnames=[]
for files in filenames:
     expnames.append(files.rsplit('_',1)[0])

##   expnames=[abc, bcd]

 dfs = []
 for exp in expnames:
      for files in filenames:
          if files.startswith(exp):
               dfs.append(pd.read_csv(file_path+files,sep=',',header=None))
  big_frame = pd.concat(dfs, ignore_index=True)

My output contain duplicate rows due to multiple for loops

Output:

Can someone help wih this?

rahlf23 · Accepted Answer

This will store your desired outputs in a list of dataframes called list_of_dfs and then create a MultiIndex dataframe final from them with the file prefixes (e.g. ['abc','bcd']) as the keys for the outermost index level:

import pandas as pd
import os

filenames = os.listdir(file_path)

prefixes = list(set(i.split('_')[0] for i in filenames))

list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) for file in filenames if file.startswith(prefix)], ignore_index=True) for prefix in prefixes]

final = pd.concat(list_of_dfs, keys=prefixes)

Merge files with similar name in to one dataframe

Answers (2)

Related Questions