Colton Medler
Colton Medler

Reputation: 91

Python Pandas: Group files in a directory by similar filenames and concatenate dataframes in a specific order

My goal is to group .csv files in a directory by shared characteristics in the file name. My directory contains files with names:

I would like to sort these files into groups on the numbers following the "Source" and "Receiver" sections of the file name (as shown below) so I can later concatenate them.

Group 1

Group 2

Any ideas?

Upvotes: 0

Views: 834

Answers (1)

gold_cy
gold_cy

Reputation: 14226

It says you want to do this in pandas so here is a pandas solution.

fnames = ['After_Source1_Receiver1.csv',
          'After_Source1_Receiver2.csv',
          'Before_Source1_Receiver1.csv',
          'Before_Source1_Receiver2.csv',
          'During1_Source1_Receiver1.csv',
          'During1_Source1_Receiver2.csv',
          'During2_Source1_Receiver1.csv',
          'During2_Source1_Receiver2.csv']

df = pd.DataFrame(fnames, columns=['names'])

I don't know what you want to do with your end results but this is how you group them.

pattern = r'Source(\d+)_Receiver(\d+)'
for _, g in pd.concat([df, df['names'].str.extract(pattern)], axis=1).groupby([0,1]):
    print(g.names)

0      After_Source1_Receiver1.csv
2     Before_Source1_Receiver1.csv
4    During1_Source1_Receiver1.csv
6    During2_Source1_Receiver1.csv
Name: names, dtype: object
1      After_Source1_Receiver2.csv
3     Before_Source1_Receiver2.csv
5    During1_Source1_Receiver2.csv
7    During2_Source1_Receiver2.csv
Name: names, dtype: object

Upvotes: 1

Related Questions