Wildcard character not working in pyspark dataframe

Question

When i execute the following snippet of code, df1 shows no result. When i substitute the wild character "*" with a "1,2,3,.." df1 shows values. What am i missing?

from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import DataFrame
import pyspark.sql.functions
.
.
.
df1= df.filter(df.DATE == "*162014").filter(df.TMC == "111N04908")\
       .sort(df.EPOCH.asc())

2e4fa79c · Accepted Answer

Only that == means it equals - nothing more, nothing less. It doesn't use wildcards, regular expressions or SQL patterns. If you want to use patterns use LIKE or RLIKE.

expr("DATE RLIKE '%162014'")
expr("DATE LIKE '*162014'")

Wildcard character not working in pyspark dataframe

Answers (2)

Related Questions