Reputation: 14067
a pyspark.sql.DataFrame
displays messy with DataFrame.show()
- lines wrap instead of a scroll.
but displays with pandas.DataFrame.head
I tried these options
import IPython
IPython.auto_scroll_threshold = 9999
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display
but no luck. Although the scroll works when used within Atom editor with jupyter plugin:
Upvotes: 47
Views: 35898
Reputation: 6343
Try running this in its own cell:
%%html
<style>
div.jp-OutputArea-output pre {
white-space: pre;
}
</style>
This is based on the solution posted by @MateoB27 (their code did not work for me though it was close).
Upvotes: 2
Reputation: 673
If anyone's still facing the issue, this could be resolved by tweaking website settings using developer tools.
Open developer setting (F12). then inspect element (Windows: ctrl+shift+c, Mac: cmd+option+c). After this click (select) the dataframe output (shown in picture above). and uncheck whitespace attribute (see snapshot below)
You just need to do this setting once. (unless you refresh the page)
This will show you the exact data natively as is. No need to convert to pandas.
Upvotes: 18
Reputation: 313
I would create a small function to convert PySpark Dataframe to Pandas Dataframe and then pick head and call it like this
Function
def display_df(df):
return df.limit(5).toPandas().head()
Then call
display_df(spark_df)
You do have to have pandas imported
import pandas as pd
Upvotes: 0
Reputation: 21
What worked for me since im using an environment i dont have access to css files and wanted to do it in a cell using jupyter magic commands got a neat solution.
Found the solution at https://stackoverflow.com/a/63476260/11795760
Just paste in a cell:
%%html
<style>
div.output_area pre {
white-space: pre;
}
works also in scala notebooks
Upvotes: 1
Reputation: 977
This solution does not depend on pandas, it does not change the jupyter settings and it looks good (scrollbar will activate automatically).
import pyspark
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName("My App").getOrCreate()
spark.conf.set("spark.sql.repl.eagerEval.enabled", True)
data = [
[1, 1, 'A'],
[2, 2, 'A'],
[3, 3, 'A'],
[4, 3, 'B'],
[5, 4, 'B'],
[6, 5, 'C'],
[7, 6, 'C']]
df = spark.sparkContext.parallelize(data).toDF(('column_1', 'column_2', 'column_3'))
# This will print a pretty table
df
Upvotes: 1
Reputation: 588
Just add (and execute)
from IPython.core.display import HTML
display(HTML("<style>pre { white-space: pre !important; }</style>"))
And you'll get the df.show()
with the scrollbar
Upvotes: 42
Reputation: 357
To be precise for what someone said before.
In the file anaconda3/lib/python3.7/site- packages/notebook/static/style/style.min.css
there are 2 white-space: nowrap;
you have to comment the one here in this way samp { /*white-space: nowrap;*/ }
save it and the restart jupyter
Upvotes: 0
Reputation: 1470
Adding to the answers given above by @karan-singla and @vijay-jangir, a handy one-liner to comment out the white-space: pre-wrap
styling can be done like so:
$ awk -i inplace '/pre-wrap/ {$0="/*"$0"*/"}1' $(dirname `python -c "import notebook as nb;print(nb.__file__)"`)/static/style/style.min.css
This translates as; use awk
to update inplace lines that contain pre-wrap
to be surrounded by */ -- */
i.e. comment out, on the file found in styles.css
found in your working Python environment.
This, in theory, can then be used as an alias if one uses multiple environments, say with Anaconda.
REFs:
Upvotes: 1
Reputation: 101
Just edit the css file and you are good to go.
Open the jupyter notebook ../site-packages/notebook/static/style/style.min.css
file.
Search for white-space: pre-wrap;
, and remove it.
Save the file and restart jupyter-notebook.
Problem fixed. :)
Upvotes: 10
Reputation: 14067
this is a workaround
spark_df.limit(5).toPandas().head()
although, I do not know the computational burden of this query. I am thinking limit()
is not expensive. corrections welcome.
Upvotes: 35
Reputation: 23
I created below li'l function and it works fine:
def printDf(sprkDF):
newdf = sprkDF.toPandas()
from IPython.display import display, HTML
return HTML(newdf.to_html())
you can use it straight on your spark queries if you like, or on any spark data frame:
printDf(spark.sql('''
select * from employee
'''))
Upvotes: -1