Reputation: 39
I have a text file that essentially goes.
Number|Name|Report
58|John|John is great
John is good
I like John
[Report Ends]
and repeats over and over for different people.
I want to turn this into a dataframe like the following
Number Name Report
58 John John is great John is good I like John [Report Ends]
Using the line
pd.read_csv('/Path', sep="|",header=0)
I have gotten the correct column names. And the first row is correct up until the "Report section. I think that the "Report" part messes everything up because it takes over several lines in the text file. How should I fit the Report data in the dataframe?
Upvotes: 0
Views: 238
Reputation: 1960
With a few lines of manual parsing, you can extract the info and adapt it before reading it into your dataframe.
import pandas as pd
with open('info.txt', 'r') as fp:
info = fp.readlines()
df_dicts = []
cd = None
for line in info[1:]:
line = line.replace('\n', ' ').strip()
if '|' in line:
cd = {}
df_dicts.append(cd)
cd['Number'], cd['Name'], cd['Report'] = line.split('|')
else:
cd['Report'] += " " + line
print(pd.DataFrame(df_dicts))
If you have issues with the replace functions being too general, you'll have to start looking into regex.
Upvotes: 1