Reputation: 2316
So we have a file named like this:
some-app-name_CT-111-some-title-with_underscore-in-it_c37a9a5fc272a5c94009a61ce8dff79900ab9102_2017-07-24-03-22-19.tar.bz2
As you can see there are 4 parts, a app name (dasherized), a title (that may contain underscore in it), a hash code and finally a timestamp (dasherized)
They are separated by underscore, the problem is that the title may have underscore in it. So how can we get the first part first and then the last two parts (separated by underscore) and then the remain is the title?
Any help is appreciated.
Final parts should be like:
Upvotes: 1
Views: 141
Reputation: 784888
Using bash regex you can do this:
s='some-app-name_CT-111-some-title-with_underscore-in-it_c37a9a5fc272a5c94009a61ce8dff79900ab9102_2017-07-24-03-22-19.tar.bz2'
re='^([^_]+)_([a-zA-Z0-9_-]+)_([a-fA-F0-9]+)_([0-9-]+)\.'
[[ $s =~ $re ]] && printf "AppName: %s\nTitle: %s\nID: %s\nTimestamp: %s\n" \
"${BASH_REMATCH[1]}" "${BASH_REMATCH[2]}" "${BASH_REMATCH[3]}" "${BASH_REMATCH[4]}"
AppName: some-app-name
Title: CT-111-some-title-with_underscore-in-it
ID: c37a9a5fc272a5c94009a61ce8dff79900ab9102
Timestamp: 2017-07-24-03-22-19
Upvotes: 2
Reputation: 16974
One way:
appname=$(echo $x | awk -F_ '{print $1}')
hcode=$(echo $x | awk -F_ '{print $(NF-1)}')
timestamp=$(echo $x | awk -F_ '{print $NF}' | grep -oE '[0-9-]{2,}')
title=$(echo $x | sed "s/.*${appname}_\(.*\)_$hcode.*/\1/")
where x
is the variable containing the filename
hashcode is retrieved by fetching the second last column with _ as delimiter. timestamp is retrieved from the last column and extracting only numbers and - out of it. Title is retrieved by fetching characters between the appname and the hashcode.
Upvotes: 1