Peter
Peter

Reputation: 564

Regex that extracts everything until finds "/", starting from the end

I'm writing a script in bash where I use the grep function with a regex expression to extract an id which I will be using as a variable.

The goal is to extract all characters until it finds /, but the caracter ' and } should be ignored.

file.txt:

{'name': 'projects/data/locations/us-central1/datasets/dataset/source1/messages/B0g2_e8gG_xaZzpbliWvjlShnVdRNEw='}

command:

cat file.txt | grep -oP "[/]+^"

The current command isn't working.

desired output:

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

Upvotes: 0

Views: 112

Answers (6)

RavinderSingh13
RavinderSingh13

Reputation: 133538

With jq you could try following code. Firstly change all occurrences of ' to " in json to make it valid one by using tr command(as per your shown samples), then we can use jq command's sub function to get the required output.

jq -r '.[] | sub(".*/";"")' <(tr "'" '"' < Input_file)

OR you want to look for specifically name element then try following:

 jq -r '.name | sub(".*/";"")' <(tr "'" '"' < Input_file)

Upvotes: 1

The fourth bird
The fourth bird

Reputation: 163362

If the data structure is always like that and you can use jq, translate the single quotes to double quotes, take the name property and then the last values after splitting on /

tr "'" '"' < file | jq -r '.name | split("/") | last'

Output

B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

Upvotes: 1

RARE Kpop Manifesto
RARE Kpop Manifesto

Reputation: 2837

 echo "${inputdata}"| 
mawk ++NF OFS= FS='^.+/|[}\47]+$'     
B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

Upvotes: 0

Paul Hodges
Paul Hodges

Reputation: 15313

Basic parameter parsing.

$: x="$(<file.txt)"            # file contents in x
$: x="${x##*/}"                # strip to last / to get rid of 'name'
$: x="${x//[^[:alnum:]=]}"     # strip not alphanumeric or = to clean the end
$: echo "$x"
B0g2e8gGxaZzpbliWvjlShnVdRNEw=

Upvotes: 1

Ed Morton
Ed Morton

Reputation: 203684

Using any awk:

$ awk -F"[/']" '{print $(NF-1)}' file.txt
B0g2_e8gG_xaZzpbliWvjlShnVdRNEw=

Upvotes: 1

jhnc
jhnc

Reputation: 16761

The regex you gave was: [/]+^

It has a few mistakes:

  • Your use of ^ at the end seems to imply you think you can ask the software to search backwards - You can't;
  • [/] matches only the slash character.

Your sample shows what appears to be a malformed JSON object containing a key-value pair, each enclosed in single-quotes. JSON requires double-quotes so perhaps it is not JSON.

If several assumptions are made, it is possible to extract the section of the input that you seem to want:

  • file contains a single line; and
  • key and value are strings surrounded by single-quote; and
  • either:
    • the value part is immediately followed by }; or
    • the name part cannot contain /

You are using -P option to grep, so lookaround operators are available.

(?<=/)[^/]+(?=')
  • lookbehind declares match is preceded by /
  • one or more non-slash (the match)
  • lookahead declares match is followed by '
[^/]+(?='})
  • one or more non-slash (the match)
  • lookahead declares match is followed by ' then }

Note that the match begins as early in the line as possible and with greedy + it is as long as possible.

Upvotes: 0

Related Questions