Reputation: 177
first thank you for your time.
I need to make a cross between two Pandas dataframes with the values that I have in a field. The values come in the following form within a field: [A,B,C,N]
I am trying to apply a SPLIT() function to the dataframe field as follows:
df_test = df_temp["NAME"].str.split(expand=True)
The "Name" field is of type object.
My problem is that for some reason my split() splits the values of my NAME field with Null (NaN) values. I don't understand what I'm doing wrong.
From already thank you very much.
Upvotes: 0
Views: 461
Reputation: 1739
Based upon the input -
Input data -
from pyspark.sql.types import *
from pyspark.sql.functions import *
df = spark.createDataFrame([(['A', 'B', 'C', 'D'],), ], schema = ['Name'])
df.show()
+------------+
| Name|
+------------+
|[A, B, C, D]|
+------------+
Required Output -
df.select(explode(col("Name")).alias("exploded_Name")).show()
+-------------+
|exploded_Name|
+-------------+
| A|
| B|
| C|
| D|
+-------------+
Upvotes: 1
Reputation: 213
You should provide a value to separate the column with.
df['Name'].str.split(',', expand=True)
Upvotes: 0