Reputation: 13853
I have a Pandas DataFrame in which one of the columns contains string elements, and those string elements contain new lines that I would like to print literally. But they just appear as \n
in the output.
That is, I want to print this:
pos bidder
0 1
1 2
2 3 <- alice
<- bob
3 4
but this is what I get:
pos bidder
0 1
1 2
2 3 <- alice\n<- bob
3 4
How can I accomplish what I want? Can I use a DataFrame, or will I have to revert to manually printing padded columns one row at a time?
Here's what I have so far:
n = 4
output = pd.DataFrame({
'pos': range(1, n+1),
'bidder': [''] * n
})
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
if pos in used_pos:
arrow = output.ix[pos, 'bidder']
output.ix[pos, 'bidder'] = arrow + "\n<- %s" % bidder
else:
output.ix[pos, 'bidder'] = "<- %s" % bidder
print(output)
Upvotes: 32
Views: 57627
Reputation: 893
.set_properties()
and CSS white-space
property[For use in IPython notebooks]
Another way will be to use pandas's pandas.io.formats.style.Styler.set_properties() method and the CSS "white-space": "pre-wrap"
property:
from IPython.display import display
# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
'white-space': 'pre-wrap',
}))
To keep the text left-aligned, you might want to add 'text-align': 'left'
as below:
from IPython.display import display
# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
'text-align': 'left',
'white-space': 'pre-wrap',
}))
Upvotes: 31
Reputation: 61
Somewhat in line with unsorted's answer:
import pandas as pd
# Save the original `to_html` function to call it later
pd.DataFrame.base_to_html = pd.DataFrame.to_html
# Call it here in a controlled way
pd.DataFrame.to_html = (
lambda df, *args, **kwargs:
(df.base_to_html(*args, **kwargs)
.replace(r"\n", "<br/>"))
)
This way, you don't need to call any explicit function in Jupyter notebooks, as to_html
is called internally. If you want the original function, call base_to_html
(or whatever you named it).
I'm using jupyter 1.0.0
, notebook 5.7.6
.
Upvotes: 6
Reputation: 3274
If you're trying to do this in ipython notebook, you can do:
from IPython.display import display, HTML
def pretty_print(df):
return display( HTML( df.to_html().replace("\\n","<br>") ) )
Upvotes: 47
Reputation: 561
From pandas.DataFrame documention:
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure
So you can't have a row without an index. Newline "\n" won't work in DataFrame.
You could overwrite 'pos' with an empty value, and output the next 'bidder' on the next row. But then index and 'pos' would be offset every time you do that. Like:
pos bidder
0 1
1 2
2 3 <- alice
3 <- bob
4 5
So if a bidder called 'frank' had 4 as value, it would overwrite 'bob'. This would cause problems as you add more. It is probably possible to use DataFrame and write code to work around this issue, but probably worth looking into other solutions.
Here is the code to produce the output structure above.
import pandas as pd
n = 5
output = pd.DataFrame({'pos': range(1, n + 1),
'bidder': [''] * n},
columns=['pos', 'bidder'])
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
if pos in used_pos:
output.ix[pos, 'bidder'] = "<- %s" % bidder
output.ix[pos, 'pos'] = ''
else:
output.ix[pos - 1, 'bidder'] = "<- %s" % bidder
used_pos.append(pos)
print(output)
Edit:
Another option is to restructure the data and output. You could have pos as columns, and create a new row for each key/person in the data. In the code example below it prints the DataFrame with NaN values replaced with an empty string.
import pandas as pd
data = {'johnny\nnewline': 2, 'alice': 3, 'bob': 3,
'frank': 4, 'lisa': 1, 'tom': 8}
n = range(1, max(data.values()) + 1)
# Create DataFrame with columns = pos
output = pd.DataFrame(columns=n, index=[])
# Populate DataFrame with rows
for index, (bidder, pos) in enumerate(data.items()):
output.loc[index, pos] = bidder
# Print the DataFrame and remove NaN to make it easier to read.
print(output.fillna(''))
# Fetch and print every element in column 2
for index in range(1, 5):
print(output.loc[index, 2])
It depends what you want to do with the data though. Good luck :)
Upvotes: 5