Reputation: 591
I'm doing some analysis with pandas in a jupyter notebook and since my apply function takes a long time I would like to see a progress bar. Through this post here I found the tqdm library that provides a simple progress bar for pandas operations. There is also a Jupyter integration that provides a really nice progress bar where the bar itself changes over time.
However, I would like to combine the two and don't quite get how to do that. Let's just take the same example as in the documentation
import pandas as pd
import numpy as np
from tqdm import tqdm
df = pd.DataFrame(np.random.randint(0, 100, (100000, 6)))
# Register `pandas.progress_apply` and `pandas.Series.map_apply` with `tqdm`
# (can use `tqdm_gui`, `tqdm_notebook`, optional kwargs, etc.)
tqdm.pandas(desc="my bar!")
# Now you can use `progress_apply` instead of `apply`
# and `progress_map` instead of `map`
df.progress_apply(lambda x: x**2)
# can also groupby:
# df.groupby(0).progress_apply(lambda x: x**2)
It even says "can use 'tqdm_notebook' " but I don't find a way how. I've tried a few things like
tqdm_notebook(tqdm.pandas(desc="my bar!"))
or
tqdm_notebook.pandas
but they don't work. In the definition it looks to me like
tqdm.pandas(tqdm_notebook(desc="my bar!"))
should work, but the bar doesn't properly show the progress and there is still additional output.
Any other ideas?
Upvotes: 49
Views: 83525
Reputation: 11
from tqdm.notebook import tqdm
tqdm.pandas()
for versions 4.64.0 and greater.
Upvotes: 1
Reputation: 2116
My working solution (copied from the documentation):
from tqdm.auto import tqdm
tqdm.pandas()
Upvotes: 72
Reputation: 19844
If you want to use more than 1 CPU for that slow apply step, consider using swifter. As a bonus, swifter
automatically enables a tqdm
progress bar on the apply
step. To customize the bar description, use :
df.swifter.progress_bar(enable=True, desc='bar description').apply(...)
Upvotes: 6
Reputation: 355
I found that I had to import tqdm_notebook
also. A simple example is given below that works in Jupyter notebook.
Given you want to map a function on a variable to create a new variable in your pandas dataframe.
# progress bar
from tqdm import tqdm, tqdm_notebook
# instantiate
tqdm.pandas(tqdm_notebook)
# replace map with progress_map
# where df is a pandas dataframe
df['new_variable'] = df['old_variable'].progress_map(some_function)
Upvotes: 16
Reputation: 16590
You can use:
tqdm_notebook().pandas(*args, **kwargs)
This is because tqdm_notebook has a delayer adapter, so it's necessary to instanciate it before accessing its methods (including class methods).
In the future (>v5.1), you should be able to use a more uniform API:
tqdm_pandas(tqdm_notebook, *args, **kwargs)
Upvotes: 19