Nick
Nick

Reputation: 39

What would be the best way to convert a text file to a pandas dataframe?

I have a text file that essentially goes.

Number|Name|Report
58|John|John is great

John is good

I like John
[Report Ends]

and repeats over and over for different people.

I want to turn this into a dataframe like the following

Number Name Report
58     John John is great John is good I like John [Report Ends]

Using the line pd.read_csv('/Path', sep="|",header=0) I have gotten the correct column names. And the first row is correct up until the "Report section. I think that the "Report" part messes everything up because it takes over several lines in the text file. How should I fit the Report data in the dataframe?

Upvotes: 0

Views: 238

Answers (1)

Lukas Schmid
Lukas Schmid

Reputation: 1960

With a few lines of manual parsing, you can extract the info and adapt it before reading it into your dataframe.

import pandas as pd
with open('info.txt', 'r') as fp:
    info = fp.readlines()
df_dicts = []
cd = None
for line in info[1:]:
    line = line.replace('\n', ' ').strip()
    if '|' in line:
        cd = {}
        df_dicts.append(cd)
        cd['Number'], cd['Name'], cd['Report'] = line.split('|')
    else:
        cd['Report'] += " " + line

print(pd.DataFrame(df_dicts))

If you have issues with the replace functions being too general, you'll have to start looking into regex.

Upvotes: 1

Related Questions