Reputation: 13141
It appears that the pandas to_csv function only allows single character delimiters/separators.
Is there some way to allow for a string of characters to be used like, "::" or "%%" instead?
I tried:
df.to_csv(local_file, sep = '::', header=None, index=False)
and getting:
TypeError: "delimiter" must be a 1-character string
Upvotes: 4
Views: 6284
Reputation: 11
Use numpy.savetxt
.
Examples:
np.savetxt(
'file.csv',
np.char.decode(chunk_data.values.astype(np.bytes_), 'UTF-8'),
delimiter='~|',
fmt='%s',
encoding=None)
np.savetxt(
'file.dat',
chunk_data.values,
delimiter='~|',
fmt='%s',
encoding='utf-8')
Upvotes: 5
Reputation: 1380
For the moment I am stuck on an old version of pandas. My task was to read a csv with "__" delimiters, clean it to remove personal identifying information, and write the results a new file. I need the result to have the same two-character delimiter.
My preferred solution would have been to convert to numpy and save, like this:
df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")
# remove personal identifying info from dataframe
massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)
np_data = massaged.to_numpy()
np.savetxt("patient_massaged.txt", np_data, fmt="%s", delimiter="__")
However, to_numpy() isn't supported in the version of Pandas I have.
So, my fix was to generate a csv with "}" as a temp delimiter, save that to a variable, do a string replace, and write the file myself:
df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")
# remove personal identifying info from dataframe
massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)
x = massaged.to_csv(sep="}", header=False, index=False)
x = x.replace("}", "__")
f=open("patient_massaged.txt", "w")
f.write(x)
f.close()
Upvotes: 0
Reputation: 366083
Think about what this line a::b::c‘
means to a standard CSV tool: an a
, an empty column, a b
, an empty column, and a c
. Even in a more complicated case with quoting or escaping:"abc::def"::2
means an abc::def
, an empty column, and a 2.
So, all you have to do is add an empty column between every column, and then use :
as a delimiter, and the output will be almost what you want.
I say “almost” because Pandas is going to quote or escape single colons. Depending on the dialect options you’re using, and the tool you’re trying to interact with, this may or may not be a problem. Unnecessary quoting usually isn’t a problem (unless you ask for QUOTE_ALL
, because then your columns will be separated by :"":
, so hopefully you don’t need that dialect option), but unnecessary escapes might be (e.g., you might end up with every single :
in a string turned into a \:
or something). So you have to be careful with the options. But it’ll work for the basic “quote as needed, with mostly standard other options” settings.
Upvotes: 0