Reputation: 111
Is there a way to split a snowpark dataframe column based on a string?
Here is what I have tried so far
from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF
connection_parameters = {
"account": "myaccount",
"user": "myuser",
}
session = Session.builder.configs(connection_parameters).create()
dh_session = session.table('tableName')
dh = dh_session.select(SF.to_timestamp(SF.col("timestamp")).as_("timestamp"),SF.col("name"))
# Split the name column by string delimiter '-AA' and get the first part
dh.select(SF.split("name",SF.lit("-AA").get_value(0)).as_("test")).show()
However I get an error message
AttributeError: 'Column' object has no attribute 'getItem'
Thanks
Upvotes: 1
Views: 1989
Reputation: 21
Example of first using a filter then select with sample dataframe. Note this uses a 'like' match which may or may not be an accurate regex for your data.
from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF
connection_parameters = {
"account": "myaccount",
"user": "myuser",
}
session = Session.builder.configs(connection_parameters).create()
##
## Create a dummy dataframe
##
dh = session.create_dataframe([["morning-CC", 2], ["hello-AA", 4],
["bye-BB", 7], ["another-AA", 3]], schema=["name", "col2"])
dh.filter("name like '%-AA%'").select(col("name"),
SF.split(col("name"),SF.lit("-"))[0].alias("test")).show()
"NAME" | "TEST" |
---|---|
hello-AA | "hello" |
another-AA | "another" |
Upvotes: 1
Reputation: 11
Instead of using get_value
, try using get
like so,
from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF
connection_parameters = {
"account": "myaccount",
"user": "myuser",
}
session = Session.builder.configs(connection_parameters).create()
dh_session = session.table('tableName')
dh = dh_session.select(SF.to_timestamp(SF.col("timestamp")).as_("timestamp"),SF.col("name"))
# Split the name column by string delimiter '-AA' and get the first part
dh = dh.select(SF.split("name",SF.lit("-AA")).as_("split_name"))
dh.select(SF.get(dh.split_name, SF.lit(0)).as_("name[0]")).show()
Upvotes: 1