Reputation: 13853

Pretty printing newlines inside a string in a Pandas DataFrame

I have a Pandas DataFrame in which one of the columns contains string elements, and those string elements contain new lines that I would like to print literally. But they just appear as \n in the output.

That is, I want to print this:

  pos     bidder
0   1
1   2
2   3  <- alice
       <- bob
3   4

but this is what I get:

  pos            bidder
0   1
1   2
2   3  <- alice\n<- bob
3   4

How can I accomplish what I want? Can I use a DataFrame, or will I have to revert to manually printing padded columns one row at a time?

Here's what I have so far:

n = 4
output = pd.DataFrame({
    'pos': range(1, n+1),
    'bidder': [''] * n
})
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        arrow = output.ix[pos, 'bidder']
        output.ix[pos, 'bidder'] = arrow + "\n<- %s" % bidder
    else:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
print(output)

Upvotes: 32

Answers (4)

yongjieyongjie

Reputation: 893

Using pandas `.set_properties()` and CSS `white-space` property

[For use in IPython notebooks]

Another way will be to use pandas's pandas.io.formats.style.Styler.set_properties() method and the CSS "white-space": "pre-wrap" property:

from IPython.display import display

# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
    'white-space': 'pre-wrap',
}))

To keep the text left-aligned, you might want to add 'text-align': 'left' as below:

from IPython.display import display

# Assuming the variable df contains the relevant DataFrame
display(df.style.set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap',
}))

Upvotes: 31

Roger d'Amiens

Reputation: 61

Somewhat in line with unsorted's answer:

import pandas as pd

# Save the original `to_html` function to call it later
pd.DataFrame.base_to_html = pd.DataFrame.to_html
# Call it here in a controlled way
pd.DataFrame.to_html = (
    lambda df, *args, **kwargs: 
        (df.base_to_html(*args, **kwargs)
           .replace(r"\n", "<br/>"))
)

This way, you don't need to call any explicit function in Jupyter notebooks, as to_html is called internally. If you want the original function, call base_to_html (or whatever you named it).

I'm using jupyter 1.0.0, notebook 5.7.6.

Upvotes: 6

unsorted

Reputation: 3274

If you're trying to do this in ipython notebook, you can do:

from IPython.display import display, HTML

def pretty_print(df):
    return display( HTML( df.to_html().replace("\\n","<br>") ) )

Upvotes: 47

oystein-hr

Reputation: 561

From pandas.DataFrame documention:

Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and column labels. Can be thought of as a dict-like container for Series objects. The primary pandas data structure

So you can't have a row without an index. Newline "\n" won't work in DataFrame.

You could overwrite 'pos' with an empty value, and output the next 'bidder' on the next row. But then index and 'pos' would be offset every time you do that. Like:

  pos    bidder
0   1          
1   2          
2   3  <- alice
3        <- bob
4   5

So if a bidder called 'frank' had 4 as value, it would overwrite 'bob'. This would cause problems as you add more. It is probably possible to use DataFrame and write code to work around this issue, but probably worth looking into other solutions.

Here is the code to produce the output structure above.

import pandas as pd

n = 5
output = pd.DataFrame({'pos': range(1, n + 1),
                      'bidder': [''] * n},
                      columns=['pos', 'bidder'])
bids = {'alice': 3, 'bob': 3}
used_pos = []
for bidder, pos in bids.items():
    if pos in used_pos:
        output.ix[pos, 'bidder'] = "<- %s" % bidder
        output.ix[pos, 'pos'] = ''
    else:
        output.ix[pos - 1, 'bidder'] = "<- %s" % bidder
        used_pos.append(pos)
print(output)

Edit:

Another option is to restructure the data and output. You could have pos as columns, and create a new row for each key/person in the data. In the code example below it prints the DataFrame with NaN values replaced with an empty string.

import pandas as pd

data = {'johnny\nnewline': 2, 'alice': 3, 'bob': 3,
        'frank': 4, 'lisa': 1, 'tom': 8}
n = range(1, max(data.values()) + 1)

# Create DataFrame with columns = pos
output = pd.DataFrame(columns=n, index=[])

# Populate DataFrame with rows
for index, (bidder, pos) in enumerate(data.items()):
    output.loc[index, pos] = bidder

# Print the DataFrame and remove NaN to make it easier to read.
print(output.fillna(''))

# Fetch and print every element in column 2
for index in range(1, 5):
    print(output.loc[index, 2])

It depends what you want to do with the data though. Good luck :)

Upvotes: 5

Pretty printing newlines inside a string in a Pandas DataFrame

Answers (4)

Using pandas .set_properties() and CSS white-space property

Related Questions

Using pandas `.set_properties()` and CSS `white-space` property