Reputation: 20302
I am trying to loop through multiple text files, and get the name of each file along with the headers of each file. Basically, I'm trying to get the file name and the file schema, which will become a table in a database. Here is the code that I am testing.
import pandas as pd
import csv
import glob
import os
results = pd.DataFrame([])
#results=[]
filelist = glob.glob("C:\\Users\\ryans\\OneDrive\\Desktop\\test\\*.txt")
number_of_lines = 2
for filename in filelist:
with open(filename) as myfile:
head = [next(myfile) for x in range(2)]
results.append(filename)
results.append(head)
When I run that code, I end up with this error.
TypeError: cannot concatenate object of type '<class 'str'>'; only Series and DataFrame objs are valid
I want to end up with something like this, for the first three files.
File1:
FFIEC CDR Call Schedule RICI 09302019.txt
IDRSSD RCFDM708 RCFDM709 RCFDM710
RE CNSTR LN RECD INVST:INDV IMPARMNT RE CNSTR LN ALLW BAL: INDV IMPAIRMNT RE CNSTR LN RECD INVST:COLL IMPARMNT
File2:
FFIEC CDR Call Schedule RICI 09302020.txt
IDRSSD RCFDM708 RCFDM709 RCFDM710
RE CNSTR LN RECD INVST:INDV IMPARMNT RE CNSTR LN ALLW BAL: INDV IMPAIRMNT RE CNSTR LN RECD INVST:COLL IMPARMNT
File3:
FFIEC CDR Call Schedule RICI 12312019.txt
IDRSSD RCFDM708 RCFDM709 RCFDM710
RE CNSTR LN RECD INVST:INDV IMPARMNT RE CNSTR LN ALLW BAL: INDV IMPAIRMNT RE CNSTR LN RECD INVST:COLL IMPARMNT
I have around 3,700 text files, so I really want to automate this process of logging file names and file schemas. Here's the code that I tried.
Upvotes: 0
Views: 153
Reputation: 1194
I'm curious if just adding the lines to a results list would be sufficient for your needs:
import pandas as pd
import csv
import glob
import os
# Use a list here rather than a dataframe
results=[]
filelist = glob.glob("C:\\Users\\ryans\\OneDrive\\Desktop\\test\\*.txt")
number_of_lines = 2
for filename in filelist:
with open(filename) as myfile:
head = [next(myfile) for x in range(2)]
results.append([filename, *head])
# You can build a dataframe from that list at the end if you desire
results_df = pd.DataFrame.from_records(results, columns=['filename', 'head_1', 'head_2'])
Upvotes: 1