Reputation: 1231
I've read my data into a Pandas dataframe. I wish to split the data out into separate files based on two variables, "Zone" and "Type".
So I want to have something like:
contents[(contents['Zone']==zone) & (contents['Type']==type)].to_csv(outfl, sep=' ', header=False, index = False, float_format='%9.3f')
Strangely, my output looks like this:
200 225 255 504671.321 6342290.967 " -323.271" 1 " 0.040" " 0.319" " 0.249" " 0.141" " 2.000"
202 224 254 504721.351 6342265.992 " -323.725" 1 " 0.032" " 0.254" " 0.258" " 0.127" " 2.000"
200 225 254 504671.321 6342290.967 " -323.350" 1 " 0.038" " 0.376" " 0.243" " 0.137" " 2.000"
201 225 254 504696.336 6342290.967 " -323.593" 1 " 0.035" " 0.359" " 0.249" " 0.128" " 2.000"
Why are these quote characters appearing? I don't want them (obv) as I'm trying to create a space delimited output file. Seems like I am doing something wrong with the float_format... But not sure what?
Edited to add info at someone's request:
print contents.info()
yields:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 233976 entries, 0 to 233975
Data columns (total 12 columns):
I 233976 non-null int64
J 233976 non-null int64
K 233976 non-null int64
X 233976 non-null float64
Y 233976 non-null float64
Z 233976 non-null float64
Type 233976 non-null int64
VMI_LVMI 233976 non-null float64
SWT 233976 non-null float64
PHIT 233976 non-null float64
VCLA 233976 non-null float64
Zone 233976 non-null float64
dtypes: float64(8), int64(4)
memory usage: 23.2 MB
None
Upvotes: 0
Views: 1972
Reputation: 353479
Ah, this is simpler than it seemed. Your format "%9.3f"
means that you're going to have extra spaces on the left if your number is small enough:
>>> format(123.456, "9.3f")
' 123.456'
>>> format(123789.456, "9.3f")
'123789.456'
But since your separator is a space, this means that your output is ambiguous (you have a space both as a separator and as part of the data). So it gets quoted, so that you can successfully invert it:
>>> df.to_csv("out.csv", sep=";", float_format="%9.3f")
>>> !cat out.csv
;A;B
0;1; 0.000
1;2; 0.333
2;3; 0.667
>>> df.to_csv("out.csv", sep=" ", float_format="%9.3f")
>>> !cat out.csv
A B
0 1 " 0.000"
1 2 " 0.333"
2 3 " 0.667"
I'm not a big fan of space-delimited files in the first place, but if you really wanted one, you could simply change your format not to include the total size on the left. After modifying the frame to be more interesting:
>>> df.to_csv("out.csv", sep=" ", float_format="%.3f", index=False)
>>> !cat out.csv
A B
1 0.000
2 0.333
3 123456.789
Alternatively, if you want to keep the alignment but not have the quotes, you could use df.to_string()
and write that out:
>>> s = df.to_string(float_format=lambda x: "%9.3f" % x)
>>> print(s)
A B
0 1 0.000
1 2 0.333
2 3 123456.789
Whether or not that's a good idea depends upon whether whatever you expect to read this is capable of dealing with multicharacter delimiters. (Python's csv module, for example, can't.)
Upvotes: 2
Reputation: 251568
Your float format pads the values with spaces, but you are also trying to use a space as a the field separator. So the fields have to be quoted, or else you couldn't tell which spaces are part of the (padded) float value and which are field separators.
To fix it, either don't pad your values, or don't use space as a separator. It's probably more sensible to not pad the values. Space-padding is a visual presentation tweak that essentially turns your floats into strings. If you just care about outputting the float values, you don't care about whether they're nicely padded to a particular field width.
Upvotes: 2