Reputation: 65

Accessing a value from Dask using .loc

For the life of me, I cant figure how to combine these two dataframes. I am using the newest most updated versions of all softwares, including Python, Pandas and Dask.

#pandasframe has 10k rows and 3 columns - 
['monkey','banana','furry']

#daskframe has 1.5m rows, 1column, 135 partitions - 
row.index: 'monkey_banana_furry'
row.mycolumn = 'happy flappy tuna'

my dask dataframe has a string as its index for accessing, but when i do daskframe.loc[index_str] it returns a dask dataframe, but i thought it was supposed to return one single specific row. and i dont know how to access the row/value that i need from that dataframe. what i want is to input the index, and output one specific value.

what am i doing wrong?

Upvotes: 1

Answers (2)

PC_User

Reputation: 97

The trick is to use .compute() as it...

turns a Dask dataframe into a Pandas dataframe.

According to the docs

So, if you wanted to filter your dataframe by a specific name, you could do:

df[df['Names'] == 'MyName'].compute()

Moreover, if you do type(df), you'll get dask.dataframe.core.DataFrame, but type(df.compute()) will give you pandas.core.frame.DataFrame, so you can treat the result as any pandas dataframe.

Just consider that by doing this, you will store the data in RAM as the traditional pandas way.

Upvotes: 0

Timeless

Reputation: 37902

Even pandas.DataFrame.loc don't return a scalar if you don't specify a label for the columns.

Anyways, to get a scalar in your case, first, you need to add dask.dataframe.DataFrame.compute so you can get a pandas dataframe (since dask.dataframe.DataFrame.loc returns a dask dataframe). And only then, you can use the pandas .loc.

Assuming (dfd) is your dask dataframe, try this :

dfd.loc[index_str].compute().loc[index_str, "happy flappy tuna"]

Or this :

dfd.loc[index_str, "happy flappy tuna"].compute().iloc[0]

Upvotes: 3

Accessing a value from Dask using .loc

Answers (2)

Related Questions