Reputation: 1903
When I run the below query:
SELECT load_dt, load_fname, exec_date FROM my_table
WHERE TO_DATE(exec_date) = '2021-10-18'
I see 3k rows that look like the below:
load_dt | load_fname | exec_date |
---|---|---|
2021-10-19 10:24:25.681 -0400 | FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_1247120.csv | 2021-10-18 09:00:00.000 |
2021-10-19 10:24:18.543 -0400 | FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_3154712.csv | 2021-10-18 09:00:00.000 |
I want to delete the 3k rows so I can reload the correct data based on the load_fname, which is the file location in s3
please note I'm getting the s3 file location using airflow:
s3_full_path = self.get_partitioned_location(date).replace(
f's3://{AWS_S3_BUCKET}/', '')
which is what I use in my query below after LIKE, it changes depending on date: FK/pi/2021-10-18/post_date=2021-10-12/
so I tried this but it came up empty:
SELECT load_dt, load_fname FROM my_table
WHERE load_fname like 'FK/pi/2021-10-18/post_date=2021-10-12/'
Upvotes: 4
Views: 121
Reputation: 6812
From the description, you want to match rows where the column load_fname
begins with the following:
FK/pi/2021-10-18/post_date=2021-10-12/
So, that means you want to match all of
FK/pi/2021-10-18/post_date=2021-10-12/
FK/pi/2021-10-18/post_date=2021-10-12/2
FK/pi/2021-10-18/post_date=2021-10-12/2021
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_987654321.xyz
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_ABCDEFGHIJKLMNOPQ.abc
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_1247120.csv
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12_3154712.csv
where the last 2 are the ones in your data extract. To acheive this, we simply add the wildcard %
to the end of the string:
WHERE load_fname like 'FK/pi/2021-10-18/post_date=2021-10-12/%'
The wildcard literally means - match rows where %
could be nothing, or any string.
As for the question in the comments:
I noticed adding "%" to the beginning of the string (like Dean Flinter mentioned) also works and seems to give me the same rows as "%" only at the end. Why is this?
based on the meaning of the wildcard (match rows where %
could be nothing, or any string) so if the string to match was:
%FK/pi/2021-10-18/post_date=2021-10-12/%
then this would, in addition to mathing all those I wrote above, ALSO match any string that starts with anything whatsoever of any length (or nothing at all) and is then followed by FK/pi/2021-10-18/post_date=2021-10-12/
plus anything matched by the trailing %
. So it would match, for example:
FK/pi/2021-10-18/post_date=2021-10-12/
FK/pi/2021-10-18/post_date=2021-10-12/abcde
abcde/FK/pi/2021-10-18/post_date=2021-10-12/
abcde/FK/pi/2021-10-18/post_date=2021-10-12/abcde
There are other wildcards you can use. For example. If you knew that there are exactly 4 characters AFTER the FK/pi/2021-10-18/post_date=2021-10-12/
then you could use
WHERE load_fname LIKE 'FK/pi/2021-10-18/post_date=2021-10-12/____
where we have used 4 underscore wild cards, each of which will match 1 and only 1 characters. So this would match:
FK/pi/2021-10-18/post_date=2021-10-12/2021
FK/pi/2021-10-18/post_date=2021-10-12/2022
FK/pi/2021-10-18/post_date=2021-10-12/2023
FK/pi/2021-10-18/post_date=2021-10-12/ABCD
FK/pi/2021-10-18/post_date=2021-10-12/abcd
but not
FK/pi/2021-10-18/post_date=2021-10-12/2021-10-12
FK/pi/2021-10-18/post_date=2021-10-12/A
FK/pi/2021-10-18/post_date=2021-10-12/
Upvotes: 1