Reputation: 685
I have approximately 100 text files with clinical notes that consist of 1-2 paragraphs. Each file is named doc_1.txt to doc_179.txt accordingly. I would like to save the text from each file into a .csv file with 2 columns w/ headers (id, text). The id
columns are the name of each files.
For example doc_1
is the record file name and will become the id. The text in doc_1
will be stored the text column
. The desired results is below
| id | text |
|:-----:|:----:|
| doc_1 | abcf |
| doc_2 | efrf |
| doc_3 | gvni |
So far I am to just viewed the text and have not determine the best practical way to achieve my results.
Upvotes: 0
Views: 1008
Reputation: 685
I wanted to update the solution that was provided to me to resolve my problem.
import pandas as pd
import glob
txtfiles = []
for file in glob.glob("*.txt"):
txtfiles.append(file)
files_list = [f for f in glob.glob("*.txt")]
df = pd.DataFrame(columns=["id", "text"])
for file in files_list:
with open(file) as f:
txt = f.read() # to retrieve the text in the file
file_name = file.split(".")[0] # to remove file type
df = df.append({"id": file_name, "text": txt}, ignore_index=True)
Upvotes: 1
Reputation: 159
Assume you would have a list of files.
import pandas as pd # remove if already imported
# ...
files_list = ["doc_1.txt", "doc_2.txt", ..., "doc_179.txt"]
Create DataFrame with the necessary columns:
df = pd.DataFrame(columns=["id", "text"])
Iterate through each file to read the text and then save into a csv file
for file in files_list:
with open(file) as f:
txt = f.read() # to retrieve the text in the file
file_name = file.split(".")[0] # to remove file type
df = df.append({"id": file_name, "text": txt}, ignore_index=True) # add row to DataFrame
df.to_csv("result.csv", sep="|", index=False) # export DataFrame into csv file
Feel free to change the name of the output csv file (result.csv
) and the character used for sep
.
It is strongly advised not to use a character that already contained in the text of any of the files. (For example, if any of the text files already contains commas in the text, do not use ,
as the sep
value.)
Upvotes: 2