Reputation: 22001

Extract subset from several file names using python

I have a lot of files in a directory with name like:

'data_2000151_avg.txt', 'data_2000251_avg.txt', 'data_2003051_avg.txt'...

Assume that one of them is called fname. I would like to extract a subset from each like so:

fname.split('_')[1][:4]

This will give as a result, 2000. I want to collect these from all the files in the directory and create a unique list. How do I do that?

Upvotes: 1

Answers (3)

Joseph

Reputation: 721

You should use os.

import os
dirname = 'PathToFile'
myuniquelist = []
for d in os.listdir(dirname):
    if d.startswith('fname'):
        myuniquelist.append(d.split('_')[1][:4])

EDIT: Just saw your comment on wanting a set. After the for loop add this line.

myuniquelist = list(set(myuniquelist))

Upvotes: 1

Sergey Gornostaev

Reputation: 7797

For listing files in directory you can use os.listdir(). For generating the list of unique values best suitable is set comprehension.

import os
data = {f.split('_')[1][:4] for f in os.listdir(dir_path)}
list(data) #if you really need a list

Upvotes: 0

Ilja Everilä

Reputation: 52937

If unique list means a list of unique values, then a combination of glob (in case the folder contains files that do not match the desired name format) and set should do the trick:

from glob import glob

uniques = {fname.split('_')[1][:4] for fname in glob('data_*_avg.txt')}
# In case you really do want a list
unique_list = list(uniques)

This assumes the files reside in the current working directory. Append path as necessary to glob('path/to/data_*_avg.txt').

Upvotes: 0

Extract subset from several file names using python

Answers (3)

Related Questions