extract part of a file path between pattern using awk

Question

I am trying to extract data from a list of file paths as I am looking to create a log of files that have completed loading. The problem is that each file path is inconsistent so I need to look for part of the file path between two regex patterns.

For example say I want to pull out two pieces of information...let's say, the data between /system/.../ and another piece of data between /data/.../sales/

/user/project-x/system/ibm/nyc/data/customers/sales/yyyymmdd
/user/project-x/system/mysql/data/regional/sales/yyyymmdd
/user/project-x/system/mysql/london/data/customers/sales/yyyymmdd
/user/project-x/system/oracle/data/tokyo-customers/Sales/yyyymmdd

So when I run the awk script I would be left with...

ibm      customers
mysql    regional
mysql    customers
oracle   tokyo-customers

Is there anyway to do that type of file path splitting?

Jean-Fran&#231;ois Fabre · Accepted Answer

I see that your path parts are fixed, so no need to use regexes. Field separation does the trick:

awk -F/ '{print $4,$7}' test.txt

(where test.txt is your input file)

Basically you're telling awk to consider slashes as field separator, and print field #4 and #7.

But to answer your question with field lookup do this (more complicated though)

awk -F/ '{a="???";b="???";for (i=0;i



This will split the fields as before, but will lookup previous/next field values and print next/previous field. Even if the fields are not at fixed positions that will work.
If pattern is nowhere to be found, this will display ??? instead.

I have included lowercase conversion since there's an occurrence of Sales as mixed case.

extract part of a file path between pattern using awk

Answers (2)

Related Questions