Dance Party
Dance Party

Reputation: 3713

Pandas df.columns.values.tostring()

When I use the following on a df...

df.columns.values.tostring()

I get the following which are not at all like my column names (and there are far fewer columns than that). When I omit "tolist()", I just get the column names.

b'0\x16B\n\x00\x00\x00\x00p\x84P\n\x00\x00\x00\x00\xf0\xe7x\t\x00\x00\x00\x00\xb0\xf3J\n\x00\x00\x00\x00p\xfc\t\x0c\x00\x00\x00\x000\xad\xd7\x00\x00\x00\x00\x00p\xae\xd7\x00\x00\x00\x00\x00\xf0\xab\xd7\x00\x00\x00\x00\x00(9\x05\x01\x00\x00\x00\x00\xf0\xa7\xdd\x0b\x00\x00\x00\x00p\xac\xdd\x0b\x00\x00\x00\x00\xf0\xed\xc1\x00\x00\x00\x00\x00\xb0\xa3\xdd\x0b\x00\x00\x00\x000g\xdd\x0b\x00\x00\x00\x00p\xf2\xb2\x0c\x00\x00\x00\x000\xf1\xb2\x0c\x00\x00\x00\x00\xf0\xf0\xb2\x0c\x00\x00\x00\x00\xb0\xf0\xb2\x0c\x00\x00\x00\x00\xa0w\x9a\x05\x00\x00\x00\x000\xae\xd7\x00\x00\x00\x00\x00\x90\x9c\xe4\x00\x00\x00\x00\x00\xd0U\n\x0c\x00\x00\x00\x00\xb0\xfa\t\x0c\x00\x00\x00\x00\xb0\n\xca\x00\x00\x00\x00\x00\x88\x8e\xbb\x00\x00\x00\x00\x00\xf0\x05\xca\x00\x00\x00\x00\x00\x90<y\t\x00\x00\x00\x00\x18?y\t\x00\x00\x00\x00\xb0\x01\xca\x00\x00\x00\x00\x00\xb0=y\t\x00\x00\x00\x00\xf8=y\t\x00\x00\x00\x00p\xac\xd7\x00\x00\x00\x00\x00\xb0\xad\xd7\x00\x00\x00\x00\x00'

I can't figure out why. The df is a product of several instances of pd.merge and type conversions.

Upvotes: 1

Views: 1911

Answers (1)

DSM
DSM

Reputation: 353199

This isn't really a pandas thing, it's a numpy thing. df.columns.values gives us a numpy array:

>>> df = pd.DataFrame({"A": [1,2,3], "B": [4,5,6]})
>>> df
   A  B
0  1  4
1  2  5
2  3  6
>>> df.columns
Index(['A', 'B'], dtype='object')
>>> df.columns.values
array(['A', 'B'], dtype=object)

The tostring method of a numpy array promises:

Construct Python bytes containing the raw data bytes in the array.

Constructs Python bytes showing a copy of the raw contents of data memory. The bytes object can be produced in either ‘C’ or ‘Fortran’, or ‘Any’ order (the default is ‘C’-order). ‘Any’ order means C-order unless the F_CONTIGUOUS flag in the array is set, in which case it means ‘Fortran’ order.

This function is a compatibility alias for tobytes. Despite its name it returns bytes not strings.

which is why you get something messy:

>>> df.columns.values.tostring()
b'\xe0N\x0e\xb7\x00\\\x14\xb7'

Upvotes: 2

Related Questions