Reputation: 57
I have a text file of a table, with a unique delimiter and a unique set of characters to mark the end of each line / row.
e.g.
new column marked by #%#
new row marked by ##@##
So the text file might read...
cat#%#dog#%#rat#%#cow##@##red#%#blue#%#green#%#yellow##@##north#%#south#%#east#%#west
Which should be read as a table with 3 rows and 4 columns, where I can add column names during loading.
cat | dog | rat | cow |
red | blue | green | yellow |
north | south | east | west |
I've tried pd.read_csv(file_name.txt, delimiter="#*#", lineterminator = '##@##')
with engine as both python and c, but c can't accept more than one character for the delimiter and python can't accept values for delimiter and lineterminator.
Is my only option to read the text file, change the delimiter and end of line value to a single character, save and read again using read_csv?
Upvotes: 2
Views: 4124
Reputation: 1822
I guess as pointed out by matheubv there is no option to solve this with pd.read_csv
. However this can be easily fixed a few lines of codes. Just open the file (in the example sample.csv
) and parse it (use the string method .replace()
). Afterwards you can read in the data currently saved as string in data_string
with a very basic list comprehension.
Hope this work-around helps you
import pandas as pd
from pathlib import Path
p = Path("Data/sample.csv")
with p.open() as f:
string_data = f.readline().replace('#%#',';').replace('##@##','\n')
df = pd.DataFrame([x.split(';') for x in string_data.split('\n')])
print(df)
0 1 2 3
0 cat dog rat cow
1 red blue green yellow
2 north south east west
Upvotes: 1
Reputation: 166
According to the official documentation
lineterminator : str (length 1), optional Character to break file into lines. Only valid with C parser.
Therefore I think your best option would be to open the text file and replace the line terminator before using read_csv.
Upvotes: 0