w1n
w1n

Reputation: 21

How to upload large unstructured dataset into a MediaSet in Palantir Foundry?

I have a large dataset containing more than 10,000 PDFs that I want to upload into a Media Set to be able to display in a Workshop Application and perform OCR on. When I upload try to create this MediaSet through Code Repository I get the following Error "MediaSet: TooManyItemsUploadedInTransaction". Is there a way to get around this transaction limit? I would break it up into smaller mediasets but then that does not allow for the Workshop display widget to work properly. The sample code below is the current implementation.

from transforms.api import transform, Input
from transforms.mediasets import MediaSetOutput

@transform(
    output_mediaset=MediaSetOutput("<your path to mediaset>"),
    input_dataset=Input("<your path to dataset with raw files>")
)
def compute(input_dataset, output_mediaset):
    output_mediaset.put_dataset_files(input_dataset)

I tried the above code block and it kicks the transaction error. My other attempt was to split into multiple mediasets; however, this does not allow for downstream display applications in Workshop. Any help would be appreciated!

Upvotes: 2

Views: 298

Answers (1)

ZettaP
ZettaP

Reputation: 1399

When you create the Mediaset to write to, you can select the "Real time" option. This is not subject to the 10K limit, but you won't be able to delete media from this "real time" mediaset in bulk nor snapshot it (start over/empty it) as of today.

Upvotes: 1

Related Questions