Reputation: 559
I need a one-line CSV with data split by a ,
. My problem is when I try to iterate over my Dataframe using apply, I get a Series
object and the to_csv
method gives me one str
split into lines, setting None
as ""
and without any ,
. But if I iter over the data frame with for
, my method gets a Dataframe
object, and it gives me one str
in one line with the ,
, without setting None
to ""
.
Here is a code to test this:
import pandas
def print_csv(tabular_data):
print(type(tabular_data))
csv_data = tabular_data.to_csv(header=False, index=False)
print(csv_data)
df = pandas.DataFrame([
{"a": None, "b": 0.32, "c": 0.43},
{"a": None, "b": 0.23, "c": 0.12},
])
df.apply(lambda x: print_csv(x), axis=1)
for i in range(0, df.shape[0]):
print_csv(df[i:i+1])
console output using apply
:
<class 'pandas.core.series.Series'>
""
0.32
0.43
<class 'pandas.core.series.Series'>
""
0.23
0.12
console output using for
:
<class 'pandas.core.frame.DataFrame'>
,0.32,0.43
<class 'pandas.core.frame.DataFrame'>
,0.23,0.12
I tried with csv_data = tabular_data.to_csv(header=False, index=False, sep=',')
in my function but I got the same output.
Why am I getting different output when I use the to_csv
method in a DataFrame
and in a Series
?
What are the changes that need to make so apply
gives the same result as the for
?
Upvotes: 3
Views: 553
Reputation: 559
Well, I researched a lot, and my output is different because it is the expected behavior. I found a PR in the Pandas repository where some contributor adds a snippet with Series.to_csv
and has the same output I have (This the comment from toobaz).
Because a Series is the data structure for a single column of a DataFrame, what my print_csv
function get really is a one-column data structure with my data (this is the output of print(tabular_data.head())
inside of print_csv
when called using df.apply(lambda x: print_csv(x), axis=1)
for one object):
<class 'pandas.core.series.Series'>
a None
b 0.23
c 0.12
Name: 1, dtype: object
So, it ok the CSV to be like that because it is generating one line per column:
""
0.23
0.12
What I need to do to get the output I want is change the one-column data structure to a one-row one. To do that, I transform the Series object to a DataFrame using pandas.Series.to_frame and transpose it (I use the property T of the DataFrame, it is is an accessor to pandas.DataFrame.transpose).
I changed the apply function to:
df.apply(lambda x: print_csv(x.to_frame().T), axis=1)
And the new output of print_csv
called in apply
with the DataFrame in the question (with the sample data) is what I expected:
<class 'pandas.core.frame.DataFrame'>
,0.32,0.43
<class 'pandas.core.frame.DataFrame'>
,0.23,0.12
Upvotes: 1