MaJak
MaJak

Reputation: 3

Parsing output (text) into dataframe with columns and rows

Complete newby here. I'm trying to parse a simulation output with Python into a frame and write it into an excel sheet.

The output looks like this:

F100.T,557.9567856878748,F,F,F,F,
F100.Tv,557.9567856878748,F,F,F,F,
F100.Tl,557.9567856878748,F,F,F,F,
F100.Duty,-106382.60618934222,F,F,F,T,1
... 
F200.T,557.9567856878748,F,F,F,F,
F200.Tv,557.9567856878748,F,F,F,F,
F200.Tl,557.9567856878748,F,F,F,F,
F200.Duty,-37798.28473117316,F,F,F,T,1
... and so on

How it should look like at the end:

    | F100               | F200           |
    | -------------------|----------------| 
 T  | 557.9567856878748  | 100            |
 Tv | 557.9567856878748  | 5.550847203    |
 T1 |-106382.60618934222 | 3.798721561    |

... and so on

What I've tried so far:

import itertools
import pandas as pd


def read_lines(file_object) -> list:
    return [
        parse_line(line) for line in file_object.readlines() if line.strip()
    ]


def parse_line(line: str) -> list:
    return [
        i.split(",")[1]
        for i in line.strip().split()
        if i.startswith(("F100", "F200"))
    ]


def flatten(parsed_lines: list) -> list:
    return list(itertools.chain.from_iterable(parsed_lines))


def cut_into_pieces(flattened_lines: list, piece_size: int = 2) -> list:
    return [
        flattened_lines[i:i + piece_size] for i
        in range(0, len(flattened_lines), piece_size)
    ]


with open("sim.txt") as data:
    df = pd.DataFrame(
        cut_into_pieces(flatten(read_lines(data))),
        columns=["F100", "F200"],
    )
    print(df)
    df.to_excel("table.xlsx", index=False)

So far, the result looks like this:

                    F100                 F200
1      557.9567856878748    100
2      557.9567856878748    5.550847203
3    -106382.60618934222    3.798721561
..                   ...                  ...

As you see, the rows are not named (T, Tv, T1 etc.). Im hitting a wall and don't know how to continue from here.
Thanks in advance

Upvotes: 0

Views: 159

Answers (1)

tozCSS
tozCSS

Reputation: 6134

import pandas as pd
from io import StringIO
txt = """\
F100.T,557.9567856878748,F,F,F,F,
F100.Tv,557.9567856878748,F,F,F,F,
F100.Tl,557.9567856878748,F,F,F,F,
F100.Duty,-106382.60618934222,F,F,F,T
F200.T,557.9567856878748,F,F,F,F,
F200.Tv,557.9567856878748,F,F,F,F,
F200.Tl,557.9567856878748,F,F,F,F,
F200.Duty,-37798.28473117316,F,F,F,T
"""
df = pd.read_csv(StringIO(txt),header=None)
# create two new columns from the first column
df[['a','b']] = df[0].str.split('.',expand=True)
df.pivot('b','a',1)
# note: if index/column values are not unique
# then you can aggregate them like this:
# df.pivot_table(index='b',columns='a',values=1, aggfunc='mean')
F100 F200
Duty -106383 -37798.3
T 557.957 557.957
Tl 557.957 557.957
Tv 557.957 557.957

Upvotes: 1

Related Questions