Understanding what map_partitions in dask does

Question

I am trying to understand what map_partitions in dask does. Here is my example:

import dask.dataframe as dd
import pandas as pd
from dask.multiprocessing import get
import random

df = pd.DataFrame({'col_1':random.sample(range(10000), 100), 'col_2': random.sample(range(10000), 100) })

def test_f(df):
    print(df.col_1)
    print("------------")

ddf = dd.from_pandas(df, npartitions=8)

ddf['result'] = ddf.map_partitions(test_f ).compute(get=get)

And here is output:

0    1.0
1    1.0
Name: col_1, dtype: float64
------------

Why I don't get full print out of my dataframe? What does output mean?

Understanding what map_partitions in dask does

Answers (1)

Related Questions