mattie_g
mattie_g

Reputation: 95

Ingesting and compiling multiple fixed width files with a loop

I would like to write clean code to read and compile multiple files with relatively lower maintenance and improved readability, but I am missing something here.

Namely after updating the file names :

#update the names of the infiles
infile1 = 'file1.txt'
infile2 = 'file2.txt'
...
infile4 = 'file4.txt' 

I would like to turn this working step :

# read fixed width file
df1 = pd.read_fwf(infile1,
                header=None,
                widths=[sample widths],
                names=[sample names here]
                )
...
...
df4 = pd.read_fwf(infile4,
                header=None,
                widths=[sample widths],
                names=[sample names here]
                )
df=pd.concat([df1,df2,df3,df4])

where [sample widths] and [sample names here] are specific to my file and quite lengthy,

into something easier to read and maintain:

# DESIRED FORM
for i in [1,2,3,4]:
    df\i = pd.read_fwf(f'infile{i}',
                       header=None,
                       widths=[sample widths],
                       names=[sample names here]
                      )
df=pd.concat([df1,df2,df3,df4])

I feel I'm close but am missing something simple here related to how I'm writing my loop. I am getting this error when I run it

df\i = pd.read_fwf('infile'f'{i}',

^

SyntaxError: unexpected character after line continuation character

Thank you.

Upvotes: 0

Views: 350

Answers (1)

Oskar_U
Oskar_U

Reputation: 482

Hi & welcome to Stack Overflow!

First you could load filenames (or longer path if you need) to a list. After that set a initial data frame with file_1 data and append the rest of the files into the created dataframe:

infiles = ['file_1.txt', ..., 'file_n.txt']
df = pd.read_fwf(infiles[0], header=None, widths=[sample widths], 
        names=[sample names here])

for i in range(1, len(infiles)):
    temp_df = pd.read_fwf(infiles[i], header=None, widths=[sample widths],
        names=[sample names here])
    df.append(temp_df)

Upvotes: 1

Related Questions