TjS
TjS

Reputation: 327

Merge files with similar name in to one dataframe

I have a list of files stored in directory such as

filenames=[
        abc_1.txt
        abc_2.txt
        abc_3.txt

        bcd_1.txt
        bcd_2.txt
        bcd_3.txt
       ]

pattern=[abc]

I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.

My code:

filenames = os.listdir(file_path)
expnames=[]
for files in filenames:
     expnames.append(files.rsplit('_',1)[0])

##   expnames=[abc, bcd]

 dfs = []
 for exp in expnames:
      for files in filenames:
          if files.startswith(exp):
               dfs.append(pd.read_csv(file_path+files,sep=',',header=None))
  big_frame = pd.concat(dfs, ignore_index=True)

My output contain duplicate rows due to multiple for loops

Output: enter image description here

Can someone help wih this?

Upvotes: 0

Views: 1198

Answers (2)

TjS
TjS

Reputation: 327

file_path = '/home/iolie/Downloads/test/'
filenames = os.listdir(file_path)
prefixes = list(set(i.split('_')[0] for i in filenames))


for prefix in prefixes:
    for file in filenames: 
        if file.startswith(prefix):
            list_of_dfs= [ pd.concat( [pd.read_csv (os.path.join(file_path, file)], header=None ),ignore_index=True)]
            final=pd.concat(list_of_dfs)

Upvotes: 0

rahlf23
rahlf23

Reputation: 9019

This will store your desired outputs in a list of dataframes called list_of_dfs and then create a MultiIndex dataframe final from them with the file prefixes (e.g. ['abc','bcd']) as the keys for the outermost index level:

import pandas as pd
import os

filenames = os.listdir(file_path)

prefixes = list(set(i.split('_')[0] for i in filenames))

list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) for file in filenames if file.startswith(prefix)], ignore_index=True) for prefix in prefixes]

final = pd.concat(list_of_dfs, keys=prefixes)

Upvotes: 1

Related Questions