GB7
GB7

Reputation: 73

Wildcard character not working in pyspark dataframe

When i execute the following snippet of code, df1 shows no result. When i substitute the wild character "*" with a "1,2,3,.." df1 shows values. What am i missing?

from __future__ import print_function
import sys
from pyspark import SparkContext
from pyspark.sql import SQLContext
from pyspark.sql import DataFrame
import pyspark.sql.functions
.
.
.
df1= df.filter(df.DATE == "*162014").filter(df.TMC == "111N04908")\
       .sort(df.EPOCH.asc())

Upvotes: 3

Views: 9668

Answers (2)

Tej
Tej

Reputation: 891

This should work

df1 = df.filter(df.DATE.rlike('*162014'))
        .filter(df.TMC == "111N04908")
        .sort(df.EPOCH.asc())

where or filter both are same

df1 = df.where(df.DATE.rlike('*162014'))
        .where(df.TMC == "111N04908")
        .sort(df.EPOCH.asc())

Upvotes: 1

2e4fa79c
2e4fa79c

Reputation: 36

Only that == means it equals - nothing more, nothing less. It doesn't use wildcards, regular expressions or SQL patterns. If you want to use patterns use LIKE or RLIKE.

expr("DATE RLIKE '%162014'")
expr("DATE LIKE '*162014'")

Upvotes: 2

Related Questions