Reputation: 65
For the life of me, I cant figure how to combine these two dataframes. I am using the newest most updated versions of all softwares, including Python, Pandas and Dask.
#pandasframe has 10k rows and 3 columns -
['monkey','banana','furry']
#daskframe has 1.5m rows, 1column, 135 partitions -
row.index: 'monkey_banana_furry'
row.mycolumn = 'happy flappy tuna'
my dask dataframe has a string as its index for accessing,
but when i do daskframe.loc[index_str]
it returns a dask dataframe, but i thought it was supposed to return one single specific row. and i dont know how to access the row/value that i need from that dataframe. what i want is to input the index, and output one specific value.
what am i doing wrong?
Upvotes: 1
Views: 1155
Reputation: 97
The trick is to use .compute() as it...
turns a Dask dataframe into a Pandas dataframe.
According to the docs
So, if you wanted to filter your dataframe by a specific name, you could do:
df[df['Names'] == 'MyName'].compute()
Moreover, if you do type(df)
, you'll get dask.dataframe.core.DataFrame
, but type(df.compute())
will give you pandas.core.frame.DataFrame
, so you can treat the result as any pandas dataframe.
Just consider that by doing this, you will store the data in RAM as the traditional pandas way.
Upvotes: 0
Reputation: 37902
Even pandas.DataFrame.loc
don't return a scalar if you don't specify a label for the columns.
Anyways, to get a scalar in your case, first, you need to add dask.dataframe.DataFrame.compute
so you can get a pandas dataframe (since dask.dataframe.DataFrame.loc
returns a dask dataframe). And only then, you can use the pandas .loc
.
Assuming (dfd
) is your dask dataframe, try this :
dfd.loc[index_str].compute().loc[index_str, "happy flappy tuna"]
Or this :
dfd.loc[index_str, "happy flappy tuna"].compute().iloc[0]
Upvotes: 3