Ananth
Ananth

Reputation: 111

Snowpark - split a dataframe column

Is there a way to split a snowpark dataframe column based on a string?

Here is what I have tried so far

from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF

connection_parameters = {
    "account": "myaccount",
    "user": "myuser",
    }  
session = Session.builder.configs(connection_parameters).create()

dh_session = session.table('tableName')
dh = dh_session.select(SF.to_timestamp(SF.col("timestamp")).as_("timestamp"),SF.col("name"))

# Split the name column by string delimiter '-AA' and get the first part 
dh.select(SF.split("name",SF.lit("-AA").get_value(0)).as_("test")).show()

However I get an error message

AttributeError: 'Column' object has no attribute 'getItem'

Thanks

Upvotes: 1

Views: 1989

Answers (2)

Kevin Chin
Kevin Chin

Reputation: 21

Example of first using a filter then select with sample dataframe. Note this uses a 'like' match which may or may not be an accurate regex for your data.

from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF

connection_parameters = {
    "account": "myaccount",
    "user": "myuser",
}  
session = Session.builder.configs(connection_parameters).create()
##
## Create a dummy dataframe
##
dh = session.create_dataframe([["morning-CC", 2], ["hello-AA", 4], 
    ["bye-BB", 7], ["another-AA", 3]], schema=["name", "col2"])

dh.filter("name like '%-AA%'").select(col("name"), 
    SF.split(col("name"),SF.lit("-"))[0].alias("test")).show()
"NAME" "TEST"
hello-AA "hello"
another-AA "another"

Upvotes: 1

Afroz Alam
Afroz Alam

Reputation: 11

Instead of using get_value, try using get like so,

from snowflake.snowpark import Session
from snowflake.snowpark import functions as SF

connection_parameters = {
    "account": "myaccount",
    "user": "myuser",
    }  
session = Session.builder.configs(connection_parameters).create()

dh_session = session.table('tableName')
dh = dh_session.select(SF.to_timestamp(SF.col("timestamp")).as_("timestamp"),SF.col("name"))

# Split the name column by string delimiter '-AA' and get the first part 
dh = dh.select(SF.split("name",SF.lit("-AA")).as_("split_name"))
dh.select(SF.get(dh.split_name, SF.lit(0)).as_("name[0]")).show()

Upvotes: 1

Related Questions