SacDeb
SacDeb

Reputation: 25

Extract PATH from filename

A list (without the .ABC and .DEF extentions!) of more than 100.000 files needs to be copied. At the moment I'm using while combined with find command in the /opt/project/ directory to generate the full PATH so I can copy them later.

while read LINE; do find opt/project/TOP3RST_0_/ -name "$LINE"*; done <  < TOP3RST_0_file.list > PATH_TOP3RST_0_file.list

This process is going to slowly. I wonder if I can use awk, sed or something else to create the full PATH from the file list. Also if I can check if each file does exist would be a bonus.

From this:

BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002 BT_SUPR_TOP3RST_0__20200716T005308_20200716T005352_0002 BT_SUPR_TOP3RST_0__20200716T005653_20200716T005748_0002 BT_SUPR_TOP3RST_0__20200716T005752_20200716T005824_0002 BT_SUPR_TOP3RST_0__20200716T010842_20200716T011051_0002

Expected output PATH should be like this:

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002.ABC

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002.DEF

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005308_20200716T005352_0002/BT_SUPR_TOP3RST_0__20200716T005308_20200716T005352_0002.ABC

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005308_20200716T005352_0002/BT_SUPR_TOP3RST_0__20200716T005308_20200716T005352_0002.DEF

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005653_20200716T005748_0002/BT_SUPR_TOP3RST_0__20200716T005653_20200716T005748_0002.ABC

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005653_20200716T005748_0002/BT_SUPR_TOP3RST_0__20200716T005653_20200716T005748_0002.DEF

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005752_20200716T005824_0002/BT_SUPR_TOP3RST_0__20200716T005752_20200716T005824_0002.ABC

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T005752_20200716T005824_0002/BT_SUPR_TOP3RST_0__20200716T005752_20200716T005824_0002.DEF

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T010842_20200716T011051_0002/BT_SUPR_TOP3RST_0__20200716T010842_20200716T011051_0002.ABC

/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T010842_20200716T011051_0002/BT_SUPR_TOP3RST_0__20200716T010842_20200716T011051_0002.DEF

Last I need the calculation of the gap in time:

BT_SUPR_TOP3RST_0__20200716T003457_20100716T004736_0002.ABC

20200716T003457 = 2020-07-16 00:34:57

20200716T004736 = 2020-07-16 00:47:36

I reckon something like datediff can calculate the gap?

Upvotes: 0

Views: 130

Answers (1)

KamilCuk
KamilCuk

Reputation: 140970

The following sed line may let you get started:

$ sed 's@.*__\([0-9]\{4\}\)\([0-9]\{2\}\)\([0-9]\{2\}\).*@/opt/project/TOP3RST_0_/\1/\2/\3/&/&@; s/.*/&.ABC\n&.DEF/' <<<'BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002'
/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002.ABC
/opt/project/TOP3RST_0_/2020/07/16/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002/BT_SUPR_TOP3RST_0__20200716T004902_20200716T005113_0002.DEF

The whole line is matched with the year, month and day saved with backreferences. Then the proper pattern is generated. With a second s command two lines are outputted with different suffix. For learning regexes I recommend regex crosswords available on the net. This sed introduction is great, but here only s command is used. FAQ: & is the whole matched pattern and s command may take any character as delimiter.

Upvotes: 1

Related Questions