Reputation: 45
I have this CSV file:
CodiceFiscale;IVA;RAE;SAE
RDCRLR97F07F839F;11111111111;123;123
rdcrlr97f07f839f;11111111111;123;123
errorctrl1 ;34;12;1
error;12;12;1113
And I want to select all the rows where the column codicefiscale
follows this pattern: [A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9][A-Z][0-9][0-9][A-Z][0-9][0-9][0-9][A-Z]
.
I have made this query on pyspark sql:
SELECT * FROM soggetti WHERE CodiceFiscale LIKE ('[A-Z][A-Z][A-Z][A-Z][A-Z][A-Z][0-9][0-9][A-Z][0-9][0-9][A-Z][0-9][0-9][0-9][A-Z]')
But it doesn't work
Upvotes: 1
Views: 1118
Reputation: 32660
Use this regex : ^[A-Z]{6}[0-9]{2}[A-Z][0-9]{2}[A-Z][0-9]{3}[A-Z]$
with rlike
SQL function.
The qualifier {n}
mean it matches exactly n
times. You need also to add ^
(start of line) and $
(end of line) to match the entire CodiceFiscale
:
data = [
("RDCRLR97F07F839F", 11111111111, 123, 123),
("rdcrlr97f07f839f", 11111111111, 123, 123),
("errorctrl1", 34, 12, 1),
("error", 12, 12, 1113)
]
df = spark.createDataFrame(data, ["CodiceFiscale", "IVA", "RAE", "SAE"])
df.createOrReplaceTempView("soggetti")
spark.sql("""
SELECT *
FROM soggetti
WHERE CodiceFiscale RLIKE '^[A-Z]{6}[0-9]{2}[A-Z][0-9]{2}[A-Z][0-9]{3}[A-Z]$'
""").show()
#+----------------+-----------+---+---+
#| CodiceFiscale| IVA|RAE|SAE|
#+----------------+-----------+---+---+
#|RDCRLR97F07F839F|11111111111|123|123|
#+----------------+-----------+---+---+
Upvotes: 1
Reputation: 42352
You need to use rlike
if you want to use regular expressions in the filter. Your regular expression could be simplified - repeating patterns could be represented using curly brackets.
select *
from soggetti
where CodiceFiscale rlike '[A-Z]{6}([0-9]{2}[A-Z]){2}[0-9]{3}[A-Z]'
Upvotes: 0