Reputation: 262
I am trying to extract data from a list of file paths as I am looking to create a log of files that have completed loading. The problem is that each file path is inconsistent so I need to look for part of the file path between two regex patterns.
For example say I want to pull out two pieces of information...let's say, the data between /system/.../ and another piece of data between /data/.../sales/
/user/project-x/system/ibm/nyc/data/customers/sales/yyyymmdd
/user/project-x/system/mysql/data/regional/sales/yyyymmdd
/user/project-x/system/mysql/london/data/customers/sales/yyyymmdd
/user/project-x/system/oracle/data/tokyo-customers/Sales/yyyymmdd
So when I run the awk script I would be left with...
ibm customers
mysql regional
mysql customers
oracle tokyo-customers
Is there anyway to do that type of file path splitting?
Upvotes: 2
Views: 1695
Reputation: 140178
I see that your path parts are fixed, so no need to use regexes. Field separation does the trick:
awk -F/ '{print $4,$7}' test.txt
(where test.txt
is your input file)
Basically you're telling awk
to consider slashes as field separator, and print field #4 and #7.
But to answer your question with field lookup do this (more complicated though)
awk -F/ '{a="???";b="???";for (i=0;i<NF;i++) {if (tolower($i)=="system") a= $(i+1); if (($i=="data") && (tolower($(i+2))=="sales")) b = $(i+1)}; print a,b}' test.txt
This will split the fields as before, but will lookup previous/next field values and print next/previous field. Even if the fields are not at fixed positions that will work.
If pattern is nowhere to be found, this will display ???
instead.
I have included lowercase conversion since there's an occurrence of Sales
as mixed case.
Upvotes: 4
Reputation: 42017
With sed
:
sed -E 's_.*/system/([^/]+).*/data/([^/]+)/[Ss]ales/.*_\1 \2_'
.*/system/([^/]+).*
matches the portion after /system/
, and upto next /
, and put in captured group 1
/data/([^/]+)/[Ss]ales/
matches the portion between /data/
and /sales/
(or /Sales/
) and put in second captured group
In the replacement the the captured groups are used, separated by space
Example:
$ cat file.txt
/user/project-x/system/ibm/nyc/data/customers/sales/yyyymmdd
/user/project-x/system/mysql/data/regional/sales/yyyymmdd
/user/project-x/system/mysql/london/data/customers/sales/yyyymmdd
/user/project-x/system/oracle/data/tokyo-customers/Sales/yyyymmdd
$ sed -E 's_.*/system/([^/]+).*/data/([^/]+)/[Ss]ales/.*_\1 \2_' file.txt
ibm customers
mysql regional
mysql customers
oracle tokyo-customers
Upvotes: 1