Kamikaze K
Kamikaze K

Reputation: 191

Add column with filename wildcard

I have files that have the pattern

XXXX____________030621_120933_D.csv
YYYY____________030621_120933_E.csv
ZZZZ____________030621_120933_F.csv

I am using glob.glob and for loop to parse each file to pandas to create Data frame of which i will merge at the end. I want to add a column which will add the XXXX,YYYY, and ZZZZ to each data frame accordingly

I can create the column called ID with df['ID'] and want to pick the value from the filenames. is the easiest way to grab that from the filename when reading the CSV and processing via pd

Upvotes: 0

Views: 70

Answers (1)

Babak Fi Foo
Babak Fi Foo

Reputation: 1048

If the file names are as what you have presented, then use this code:

dir_path = #path to your directory

file_paths = glob.glob(dir_path + '*.csv')
result = pd.DataFrame()
for file_ in file_paths :
   df = pd.read_csv(file_)
   df['ID'] = file_[<index of the ID>]
   result = result.append(df, ignore_index=True)

Finding the right index might take a bit of time, but that should do it.

Upvotes: 1

Related Questions