ASH
ASH

Reputation: 20302

How can we append multiple items to a list?

I am trying to loop through multiple text files, and get the name of each file along with the headers of each file. Basically, I'm trying to get the file name and the file schema, which will become a table in a database. Here is the code that I am testing.

import pandas as pd
import csv
import glob
import os

results = pd.DataFrame([])
#results=[]
filelist = glob.glob("C:\\Users\\ryans\\OneDrive\\Desktop\\test\\*.txt")
number_of_lines = 2
for filename in filelist:
    with open(filename) as myfile:
        head = [next(myfile) for x in range(2)]
        results.append(filename)
        results.append(head)

When I run that code, I end up with this error.

TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid

I want to end up with something like this, for the first three files.

File1:

FFIEC CDR Call Schedule RICI 09302019.txt
IDRSSD  RCFDM708    RCFDM709    RCFDM710
    RE CNSTR LN RECD INVST:INDV IMPARMNT    RE CNSTR LN ALLW BAL: INDV IMPAIRMNT    RE CNSTR LN RECD INVST:COLL IMPARMNT

File2:

FFIEC CDR Call Schedule RICI 09302020.txt
IDRSSD  RCFDM708    RCFDM709    RCFDM710
    RE CNSTR LN RECD INVST:INDV IMPARMNT    RE CNSTR LN ALLW BAL: INDV IMPAIRMNT    RE CNSTR LN RECD INVST:COLL IMPARMNT

File3:

FFIEC CDR Call Schedule RICI 12312019.txt
IDRSSD  RCFDM708    RCFDM709    RCFDM710
    RE CNSTR LN RECD INVST:INDV IMPARMNT    RE CNSTR LN ALLW BAL: INDV IMPAIRMNT    RE CNSTR LN RECD INVST:COLL IMPARMNT

I have around 3,700 text files, so I really want to automate this process of logging file names and file schemas. Here's the code that I tried.

Upvotes: 0

Views: 153

Answers (1)

Matthew Cox
Matthew Cox

Reputation: 1194

I'm curious if just adding the lines to a results list would be sufficient for your needs:

import pandas as pd
import csv
import glob
import os

# Use a list here rather than a dataframe
results=[]
filelist = glob.glob("C:\\Users\\ryans\\OneDrive\\Desktop\\test\\*.txt")
number_of_lines = 2
for filename in filelist:
    with open(filename) as myfile:
        head = [next(myfile) for x in range(2)]
        results.append([filename, *head])

# You can build a dataframe from that list at the end if you desire
results_df = pd.DataFrame.from_records(results, columns=['filename', 'head_1', 'head_2'])

Upvotes: 1

Related Questions