TjS
TjS

Reputation: 327

Merge files with similar name convention to a dataframe

I have a list of files stored in directory such as

filenames=[
        abc_1.txt
        abc_2.txt
        abc_3.txt

        bcd_1.txt
        bcd_2.txt
        bcd_3.txt
       ]

pattern=[abc]

I want to read multiple txt files into one dataframe such that all files starting with abc will be in one dataframe then all all filename starting with bcd etc.

My code:

file_path = '/home/iolie/Downloads/test/'
filenames = os.listdir(file_path)


prefixes = list(set(i.split('_')[0] for i in filenames))

for prefix in prefixes:
    print('Reading files with prefix:',prefix)
    for file in filenames: 
        if file.startswith(prefix):
            print('Reading files:',file)
            list_of_dfs = [pd.concat([pd.read_csv(os.path.join(file_path, file), header=None) ],ignore_index=True)]
            final = pd.concat(list_of_dfs)

This code doesnt't append but overwrites the dataframe. Can someone help wih this?

Upvotes: 3

Views: 324

Answers (1)

jpp
jpp

Reputation: 164673

A better idea than creating an arbitrary number of unlinked dataframes is to output a dictionary of dataframes, where the key is the prefix:

from collections import defaultdict

filenames = ['abc_1.txt', 'abc_2.txt', 'abc_3.txt',
             'bcd_1.txt', 'bcd_2.txt', 'bcd_3.txt']

dd = defaultdict(list)

for fn in filenames:
    dd[fn.split('_')[0]].append(fn)

dict_of_dfs = {}
for k, v in dd.items():
    dict_of_dfs[k] = pd.concat([pd.read_csv(fn) for fn in v], ignore_index=True)

Upvotes: 3

Related Questions