6:[["$","$Le",null,{}],["$","div",null,{"className":"min-h-screen bg-gray-100 p-6","children":[["$","$Lf",null,{}],["$","script",null,{"type":"application/ld+json","dangerouslySetInnerHTML":{"__html":"{\"@context\":\"https://schema.org\",\"@type\":\"QAPage\",\"mainEntity\":{\"@type\":\"Question\",\"name\":\"How to define partitions to Dataframe in pyspark?\",\"text\":\"

Suppose I read a parquet file as a Dataframe in pyspark, how can I specify how many partitions it must be?

\\n\\n

I read the parquet file like this -

\\n\\n

df = sqlContext.read.format('parquet').load('/path/to/file')\\n

\\n\\n

How may I specify the number of partitions to be used?

\\n\",\"author\":{\"@type\":\"Person\",\"name\":\"Ani Menon\"},\"upvoteCount\":0,\"answerCount\":0,\"acceptedAnswer\":null}}"}}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mb-6 relative","children":[["$","div",null,{"className":"absolute top-4 right-4 flex flex-wrap space-x-2","children":[["$","span","dataframe",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/dataframe/1","children":"dataframe"}]}],["$","span","pyspark",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/pyspark/1","children":"pyspark"}]}],["$","span","data-partitioning",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/data-partitioning/1","children":"data-partitioning"}]}],["$","span","apache-spark-1.6",{"className":"bg-blue-600 text-white text-sm px-3 py-1 rounded-full","children":["$","$L10",null,{"href":"/discussion/tag/apache-spark-1.6/1","children":"apache-spark-1.6"}]}]]}],["$","div",null,{"className":"flex items-center mb-4","children":[["$","img",null,{"src":"https://www.gravatar.com/avatar/1bad25bebfae0b3f4bfd13b07f5d4f3b?s=256&d=identicon&r=PG","alt":"Ani Menon","className":"w-16 h-16 rounded-full border"}],["$","div",null,{"className":"ml-4","children":[["$","a",null,{"href":"https://stackoverflow.com/users/2142994/ani-menon","target":"_blank","rel":"noopener noreferrer","className":"text-lg font-semibold text-blue-600 hover:underline","children":"Ani Menon"}],["$","p",null,{"className":"text-sm text-gray-500","children":["Reputation: ",28277]}]]}]]}],["$","h1",null,{"className":"text-2xl font-bold text-gray-800 mb-4","children":"How to define partitions to Dataframe in pyspark?"}],["$","p",null,{"className":"text-gray-700 mt-4","dangerouslySetInnerHTML":{"__html":"

Suppose I read a parquet file as a Dataframe in pyspark, how can I specify how many partitions it must be?

\n\n

I read the parquet file like this -

\n\n

df = sqlContext.read.format('parquet').load('/path/to/file')\n

\n\n

How may I specify the number of partitions to be used?

\n"}}],["$","div",null,{"className":"text-gray-600 text-sm mt-4","children":[["$","p",null,{"children":["Upvotes: ",0]}],["$","p",null,{"children":["Views: ",267]}]]}]]}],["$","div",null,{"className":"container mx-auto","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-6","children":["Answers (",0,")"]}],[]]}],["$","div",null,{"className":"bg-white shadow-md rounded-lg p-6 mt-6","children":[["$","h2",null,{"className":"text-2xl font-semibold text-gray-800 mb-4","children":"Related Questions"}],["$","ul",null,{"className":"list-disc list-inside","children":[["$","li","30995699",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/30995699","className":"text-blue-600 hover:underline","children":"How to define partitioning of DataFrame?"}]}],["$","li","74040099",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/74040099","className":"text-blue-600 hover:underline","children":"Partition in dataframe pyspark"}]}],["$","li","52790703",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/52790703","className":"text-blue-600 hover:underline","children":"Partitioning of Data Frame in Pyspark using Custom Partitioner"}]}],["$","li","50757050",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/50757050","className":"text-blue-600 hover:underline","children":"How to force a certain partitioning in a PySpark DataFrame?"}]}],["$","li","64977719",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/64977719","className":"text-blue-600 hover:underline","children":"Pyspark partition data by a column and write parquet"}]}],["$","li","68862781",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/68862781","className":"text-blue-600 hover:underline","children":"How to find number of partitions in a DataFrame using Python in spark And how to create Partitions in a DataFrame with Python in spark"}]}],["$","li","67637864",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/67637864","className":"text-blue-600 hover:underline","children":"How to partition dataframe by column in pyspark for further processing?"}]}],["$","li","67615764",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/67615764","className":"text-blue-600 hover:underline","children":"Limit number of partitions for spark.read pyspark"}]}],["$","li","45844684",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/45844684","className":"text-blue-600 hover:underline","children":"How to re-partition pyspark dataframe?"}]}],["$","li","53313030",{"className":"mb-2","children":["$","$L10",null,{"href":"/discussion/solution/53313030","className":"text-blue-600 hover:underline","children":"Spark Dataframe grouping and partition by key with a set number of partitions."}]}]]}]]}]]}],["$","$L11",null,{}],["$","$L12",null,{}],["$","$L13",null,{}],["$","$L14",null,{}],["$","$L15",null,{}]]

How to define partitions to Dataframe in pyspark?

Answers (0)

Related Questions