Read txt file to pandas dataframe with unique delimiter and end of line

Question

I have a text file of a table, with a unique delimiter and a unique set of characters to mark the end of each line / row.

e.g. new column marked by #%# new row marked by ##@##

So the text file might read...

cat#%#dog#%#rat#%#cow##@##red#%#blue#%#green#%#yellow##@##north#%#south#%#east#%#west

Which should be read as a table with 3 rows and 4 columns, where I can add column names during loading.


cat	dog	rat	cow
red	blue	green	yellow
north	south	east	west

I've tried pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##') with engine as both python and c, but c can't accept more than one character for the delimiter and python can't accept values for delimiter and lineterminator.

Is my only option to read the text file, change the delimiter and end of line value to a single character, save and read again using read_csv?

Bj&#246;rn · Accepted Answer

I guess as pointed out by matheubv there is no option to solve this with pd.read_csv. However this can be easily fixed a few lines of codes. Just open the file (in the example sample.csv) and parse it (use the string method .replace()). Afterwards you can read in the data currently saved as string in data_string with a very basic list comprehension.

Hope this work-around helps you

import pandas as pd
from pathlib import Path

p = Path("Data/sample.csv")

with p.open() as f:
    string_data = f.readline().replace('#%#',';').replace('##@##','
')
    df = pd.DataFrame([x.split(';') for x in string_data.split('
')])
    print(df)

Output:

       0      1      2       3
0    cat    dog    rat     cow
1    red   blue  green  yellow
2  north  south   east    west

Read txt file to pandas dataframe with unique delimiter and end of line

Answers (2)

Output:

Related Questions