Jamie
Jamie

Reputation: 57

Read txt file to pandas dataframe with unique delimiter and end of line

I have a text file of a table, with a unique delimiter and a unique set of characters to mark the end of each line / row.

e.g. new column marked by #%# new row marked by ##@##

So the text file might read...

cat#%#dog#%#rat#%#cow##@##red#%#blue#%#green#%#yellow##@##north#%#south#%#east#%#west

Which should be read as a table with 3 rows and 4 columns, where I can add column names during loading.

cat dog rat cow
red blue green yellow
north south east west

I've tried pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##') with engine as both python and c, but c can't accept more than one character for the delimiter and python can't accept values for delimiter and lineterminator.

Is my only option to read the text file, change the delimiter and end of line value to a single character, save and read again using read_csv?

Upvotes: 2

Views: 4124

Answers (2)

Björn
Björn

Reputation: 1822

I guess as pointed out by matheubv there is no option to solve this with pd.read_csv. However this can be easily fixed a few lines of codes. Just open the file (in the example sample.csv) and parse it (use the string method .replace()). Afterwards you can read in the data currently saved as string in data_string with a very basic list comprehension.

Hope this work-around helps you

import pandas as pd
from pathlib import Path

p = Path("Data/sample.csv")

with p.open() as f:
    string_data = f.readline().replace('#%#',';').replace('##@##','\n')
    df = pd.DataFrame([x.split(';') for x in string_data.split('\n')])
    print(df)

Output:

       0      1      2       3
0    cat    dog    rat     cow
1    red   blue  green  yellow
2  north  south   east    west

Upvotes: 1

matheubv
matheubv

Reputation: 166

According to the official documentation

lineterminator : str (length 1), optional Character to break file into lines. Only valid with C parser.

Therefore I think your best option would be to open the text file and replace the line terminator before using read_csv.

Upvotes: 0

Related Questions