Reputation: 6899
It appears that the pandas read_csv function only allows single character delimiters/separators. Is there some way to allow for a string of characters to be used like, "*|*" or "%%" instead?
Upvotes: 28
Views: 36167
Reputation: 1701
In pandas 1.1.4, when I try to use a multiple char separator, I get the message:
ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
Hence, to be able to use multiple char separator, a modern solution seems to be to add engine='python'
in read_csv
argument (in my case, I use it with sep='[ ]?;
)
Upvotes: 0
Reputation: 45
Not a pythonic way but definitely a programming way, you can use something like this:
import re
def row_reader(row,fd):
arr=[]
in_arr = str.split(fd)
i = 0
while i < len(in_arr):
if re.match('^".*',in_arr[i]) and not re.match('.*"$',in_arr[i]):
flag = True
buf=''
while flag and i < len(in_arr):
buf += in_arr[i]
if re.match('.*"$',in_arr[i]):
flag = False
i+=1
buf += fd if flag else ''
arr.append(buf)
else:
arr.append(in_arr[i])
i+=1
return arr
with open(file_name,'r') as infile:
for row in infile:
for field in row_reader(row,'%%'):
print(field)
Upvotes: 0
Reputation: 76396
As Padraic Cunningham writes in the comment above, it's unclear why you want this. The Wiki entry for the CSV Spec states about delimiters:
... separated by delimiters (typically a single reserved character such as comma, semicolon, or tab; sometimes the delimiter may include optional spaces),
It's unsurprising, that both the csv
module and pandas
don't support what you're asking.
However, if you really want to do so, you're pretty much down to using Python's string manipulations. The following example shows how to turn the dataframe to a "csv" with $$
separating lines, and %%
separating columns.
'$$'.join('%%'.join(str(r) for r in rec) for rec in df.to_records())
Of course, you don't have to turn it into a string like this prior to writing it into a file.
Upvotes: 1
Reputation: 2915
Pandas does now support multi character delimiters
import panda as pd
pd.read_csv(csv_file, sep="\*\|\*")
Upvotes: 13
Reputation: 6899
The solution would be to use read_table instead of read_csv:
1*|*2*|*3*|*4*|*5
12*|*12*|*13*|*14*|*15
21*|*22*|*23*|*24*|*25
So, we could read this with:
pd.read_table('file.csv', header=None, sep='\*\|\*')
Upvotes: 5