Joe
Joe

Reputation: 13141

Python Pandas - use Multiple Character Delimiter when writing to_csv

It appears that the pandas to_csv function only allows single character delimiters/separators.

Is there some way to allow for a string of characters to be used like, "::" or "%%" instead?

I tried:

df.to_csv(local_file,  sep = '::', header=None, index=False)

and getting:

TypeError: "delimiter" must be a 1-character string

Upvotes: 4

Views: 6284

Answers (3)

AYAN JAVA
AYAN JAVA

Reputation: 11

Use numpy.savetxt.

Examples:

np.savetxt(
    'file.csv',
    np.char.decode(chunk_data.values.astype(np.bytes_), 'UTF-8'),
    delimiter='~|',
    fmt='%s',
    encoding=None)
np.savetxt(
    'file.dat',
    chunk_data.values,
    delimiter='~|',
    fmt='%s',
    encoding='utf-8')

Upvotes: 5

slogan621
slogan621

Reputation: 1380

For the moment I am stuck on an old version of pandas. My task was to read a csv with "__" delimiters, clean it to remove personal identifying information, and write the results a new file. I need the result to have the same two-character delimiter.

My preferred solution would have been to convert to numpy and save, like this:

df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)
np_data = massaged.to_numpy()
np.savetxt("patient_massaged.txt", np_data, fmt="%s", delimiter="__")

However, to_numpy() isn't supported in the version of Pandas I have.

So, my fix was to generate a csv with "}" as a temp delimiter, save that to a variable, do a string replace, and write the file myself:

df = pandas.read_csv("patient_patient-final.txt", sep="__", engine="python")

# remove personal identifying info from dataframe

massaged = df.drop(['paternal_last', 'maternal_last', 'first', 'middle', 'suffix', 'prefix', 'street1', 'street2', 'phone1', 'phone2', 'email', 'emergencyfullname', 'emergencyphone', 'emergencyemail', 'curp', 'oldid'], axis=1)

x = massaged.to_csv(sep="}", header=False, index=False)
x = x.replace("}", "__")

f=open("patient_massaged.txt", "w")
f.write(x)
f.close()

Upvotes: 0

abarnert
abarnert

Reputation: 366083

Think about what this line a::b::c‘ means to a standard CSV tool: an a, an empty column, a b, an empty column, and a c. Even in a more complicated case with quoting or escaping:"abc::def"::2 means an abc::def, an empty column, and a 2.

So, all you have to do is add an empty column between every column, and then use : as a delimiter, and the output will be almost what you want.

I say “almost” because Pandas is going to quote or escape single colons. Depending on the dialect options you’re using, and the tool you’re trying to interact with, this may or may not be a problem. Unnecessary quoting usually isn’t a problem (unless you ask for QUOTE_ALL, because then your columns will be separated by :"":, so hopefully you don’t need that dialect option), but unnecessary escapes might be (e.g., you might end up with every single : in a string turned into a \: or something). So you have to be careful with the options. But it’ll work for the basic “quote as needed, with mostly standard other options” settings.

Upvotes: 0

Related Questions