Franco Morero
Franco Morero

Reputation: 559

Why pandas Dataframe.to_csv has a different output as Series.to_csv?

I need a one-line CSV with data split by a ,. My problem is when I try to iterate over my Dataframe using apply, I get a Series object and the to_csv method gives me one str split into lines, setting None as "" and without any ,. But if I iter over the data frame with for, my method gets a Dataframe object, and it gives me one str in one line with the ,, without setting None to "".

Here is a code to test this:

import pandas


def print_csv(tabular_data):
    print(type(tabular_data))
    csv_data = tabular_data.to_csv(header=False, index=False)
    print(csv_data)


df = pandas.DataFrame([
    {"a": None, "b": 0.32, "c": 0.43},
    {"a": None, "b": 0.23, "c": 0.12},
])

df.apply(lambda x: print_csv(x), axis=1)

for i in range(0, df.shape[0]):
    print_csv(df[i:i+1])

console output using apply:

<class 'pandas.core.series.Series'>
""
0.32
0.43
<class 'pandas.core.series.Series'>
""
0.23
0.12

console output using for:

<class 'pandas.core.frame.DataFrame'>
,0.32,0.43
<class 'pandas.core.frame.DataFrame'>
,0.23,0.12

I tried with csv_data = tabular_data.to_csv(header=False, index=False, sep=',') in my function but I got the same output.

Why am I getting different output when I use the to_csv method in a DataFrame and in a Series?

What are the changes that need to make so apply gives the same result as the for?

Upvotes: 3

Views: 553

Answers (1)

Franco Morero
Franco Morero

Reputation: 559

Well, I researched a lot, and my output is different because it is the expected behavior. I found a PR in the Pandas repository where some contributor adds a snippet with Series.to_csv and has the same output I have (This the comment from toobaz).

Because a Series is the data structure for a single column of a DataFrame, what my print_csv function get really is a one-column data structure with my data (this is the output of print(tabular_data.head()) inside of print_csv when called using df.apply(lambda x: print_csv(x), axis=1) for one object):

<class 'pandas.core.series.Series'>
a    None
b    0.23
c    0.12
Name: 1, dtype: object

So, it ok the CSV to be like that because it is generating one line per column:

""
0.23
0.12

What I need to do to get the output I want is change the one-column data structure to a one-row one. To do that, I transform the Series object to a DataFrame using pandas.Series.to_frame and transpose it (I use the property T of the DataFrame, it is is an accessor to pandas.DataFrame.transpose).

I changed the apply function to:

df.apply(lambda x: print_csv(x.to_frame().T), axis=1)

And the new output of print_csv called in apply with the DataFrame in the question (with the sample data) is what I expected:

<class 'pandas.core.frame.DataFrame'>
,0.32,0.43
<class 'pandas.core.frame.DataFrame'>
,0.23,0.12

Upvotes: 1

Related Questions