Ryan
Ryan

Reputation: 193

BASH find regex for arbitrary range of numbers in a large number of files

I am writing a BASH script that, among other things, copies files from one directory to another based on input arguments for the start and end dates. The filenames are of the format YYYYMMDDhhmmss.jpg, e.g. 20161230143922.jpg. I am using find ... -exec cp {} ... because there are tens of thousands of files in the source directory. The input arguments are the start and end date in the format YYYYMMDD.

I know that I can't do a simple range in the regex like ($startdate..$enddate), but I am unable to figure out how to programmatically generate a regex that would work. If I had fewer files I could simply do cp {$startdate..$enddate} destination, but alas I don't think that is feasible.

I would like to copy all files between $startdate and $enddate that fall between the hours of 0500 and 1700. This would include images like 20170102060635.jpg and 20170104131255.jpg, but not 20170103010022.jpg.

This is what I have so far:

#!/bin/bash

STARTDATE=$1
ENDDATE=$2
FILE_NAME="review-${STARTDATE}-${ENDDATE}.mp4"

if [[ -n "$STARTDATE" ]]; then
  echo "STARTDATE: $STARTDATE"
else
  echo "Invalid start date: '$STARTDATE'"
  echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
  exit
fi

if [[ -n "$ENDDATE" ]]; then
  echo "ENDDATE: $ENDDATE"
else
  echo "Invalid end date: '$ENDDATE'"
  echo "Syntax: ./create_time_lapse_date_range.sh <startdate> <enddate>"
  exit
fi

cd ~/Desktop/test\ timelapse

# Copy relevant files to local directory
find ~/Desktop/originals -regex "???????????????" -exec cp {} ~/Desktop/test\ timelapse/ \;

# Rename files to be sequential serial numbers
find ~/Desktop/test\ timelapse -name "*.jpg" | awk 'BEGIN{ a=0  }{ printf "mv \"%s\" ~/Desktop/\"test\ timelapse/%06d.jpg\"\n", $0, a++ }' | bash

# Generate timelapse video
ffmpeg -framerate 25 -i %06d.jpg -c:v libx264 -r 25 ${FILE_NAME}

Upvotes: 0

Views: 1218

Answers (1)

Brian Stephens
Brian Stephens

Reputation: 5261

Regex isn't the best tool for dealing with numerical ranges, so you may need to consider a solution that incorporates some logic outside the regex itself. Something like this:

REGEX="([0-9]{8})([0-9]{4})"

for f in ~/Desktop/originals/*.jpg
do
    if [[ $f =~ $regex ]]
    then
        datepart=${BASH_REMATCH[1]}
        timepart=${BASH_REMATCH[2]}

        #if the DATE part matches
        if (( $STARTDATE <= $datepart )) && (( $datepart <= $ENDDATE ))
        then
            #if the TIME part matches
            if [[ $timepart =~ "(0[5-9]|1[0-7])" ]]
            then
                # copy file ...
            fi
        fi
    fi
done

Pure Regex Solution

If you really want a pure regex solution, this will help demonstrate the complexity. Here's a regex to find all the files in the 0500 to 1700 timeframe, for dates in January 2017: ^201701\d{2}(0[5-9]|1[0-7])\d{4}\.jpg$

Notice the regex pattern needed to match times from 0500 to 1700:

(0[5-9]|1[0-7])

It's not pretty, and that's with a hardcoded range. To deal with dynamic start and end dates, you would be building a similar pattern dynamically. It could be done, but why use regex for it?

Here's an example, showing what you would need to generate for a date range from 20161225 to 20170114:

^(201612(2[5-9]|3\d)|201701(0\d|1[0-4]))(0[5-9]|1[0-7])\d{4}\.jpg$

Upvotes: 1

Related Questions