Reputation: 555

Two type of headers txt to Pandas dataframe

Let's say I have a .txt file like that:

#D=H|ID|STRINGIDENTIFIER
#D=T|SEQ|DATETIME|VALUE
H|879|IDENTIFIER1
T|1|1569972384|7
T|2|1569901951|9
T|3|1569801600|8
H|892|IDENTIFIER2
T|1|1569972300|109
T|2|1569907921|101
T|3|1569803600|151

And I need to create a dataframe like this:

IDENTIFIER      SEQ DATETIME    VALUE

879_IDENTIFIER1 1   1569972384  7
879_IDENTIFIER1 2   1569901951  9
879_IDENTIFIER1 3   1569801600  8
892_IDENTIFIER2 1   1569972300  109
892_IDENTIFIER2 2   1569907921  101
892_IDENTIFIER2 3   1569803600  151

What would be the possible code?

Upvotes: 0

Answers (1)

TomMc

Reputation: 11

A basic way to do it might just to be to process the text file and convert it into a csv before using the read_csv function in pandas. Assuming the file you want to process is as consistent as the example:

import pandas as pd
with open('text.txt', 'r') as file:
    fileAsRows = file.read().split('\n')

pdInput = 'IDENTIFIER,SEQ,DATETIME,VALUE\n' #addHeader
for row in fileAsRows:
    cols = row.split('|') #breakup row

    if row.startswith('H'): #get identifier info from H row
        Identifier = cols[1]+'_'+cols[2]

    if row.startswith('T'): #get other info from T row
        Seq = cols[1]
        DateTime = cols[2]
        Value = cols[3]

        tempList = [Identifier,Seq, DateTime, Value]
        pdInput += (','.join(tempList)+'\n')

with open("pdInput.csv", "a") as file:
    file.write(pdInput)

## import into pandas
df = pd.read_csv("pdInput.csv")

Upvotes: 1

Two type of headers txt to Pandas dataframe

Answers (1)

Related Questions