Chris
Chris

Reputation: 1249

Join lines at pattern. Uneven interval

If I have this...

6,
9,
12
"url": "https://www.url.com"
6,
9,
12
"url": "https://www.url.com"
13,
16
"url": "https://www.url.com"
"url": "https://www.url.com"
18
"url": "https://www.url.com"
"url": "https://www.url.com"
3,
6,
14
"url": "https://www.url.com"
"url": "https://www.url.com"
20
"url": "https://www.url.com"
74
"url": "https://www.url.com"

How can I join the lines in a way that gives me this...

6,9,12"url": "https://www.url.com"
6,9,12"url": "https://www.url.com"
13,16"url": "https://www.url.com"
"url": "https://www.url.com"
18"url": "https://www.url.com"
"url": "https://www.url.com"
3,6,14"url": "https://www.url.com"
"url": "https://www.url.com"
20"url": "https://www.url.com"
74"url": "https://www.url.com"

I have tried using sed to delete the newline on lines that start with a number, but it doesn't work. I think because the lines are changing as its working?

sed '/^[0-9]/N;s/\n//'

I get this...

6,9,
12"url": "https://www.url.com"
6,9,
12"url": "https://www.url.com"
13,16
"url": "https://www.url.com"
"url": "https://www.url.com"
18"url": "https://www.url.com"
"url": "https://www.url.com"
3,6,
14"url": "https://www.url.com"
"url": "https://www.url.com"
20"url": "https://www.url.com"
74"url": "https://www.url.com"

EDIT: Thanks for the help and explanations. I went with this one because it was easier for me to understand. They all worked though. sed ':a;/https/!{N;ba};s/\n//g'

Upvotes: 6

Views: 109

Answers (4)

Enlico
Enlico

Reputation: 28500

The following code should work:

sed ':a;/https/!{N;ba};s/\n//g'

It is essentially a while loop, which appends line after line, as long as the outcoming multiline does not contain https; as soon as one line is appended which contains https, the while loop is abandoned (as the b command is not executed), and all embedded newlines \n are removed with the s command.

More in detail, the script (between single quotes) can be rewritten like this:

:a        # label you can jump to with a t or b command
/https/!{ # if the line does not match "https" do what's in {…}:
    N     #   append the next line to the current one (putting "\n" in between)
    ba    #   branch to the line labelled as ":a"
}
s/\n//g   # change all newlines to empty strings (i.e. remove all newlines for the current multi-line)

The corresponding pseudo-code would be

begin
while line does not contain "https" {
  append another line
}
remove all newlines

Upvotes: 5

petrus4
petrus4

Reputation: 614

If your pattern is in file+.txt:-

Save this as script+.sh, or whatever you want to call it.

#!/bin/sh -x

init () {
rm -v ./report+.txt

cat > edchop+.txt << EOF
1,${line}w temp
1,${line}d
wq
EOF

next
}

end () {
rm -v ./edchop+.txt
rm -v ./temp
exit 0
}

next () {
[[ -s file+.txt ]] && main
end
}

main () {
line=$(echo "/url/n" | ed -s file+.txt | cut -b1)
ed -s file+.txt < edchop+.txt
sed -i s'/com\"/com\"-/g' temp
cat temp | tr -d '\n' | tr '-' '\n' >> report+.txt
next
}

init

In hindsight, this is a bit hacky. I've used the .com suffix in the web address as a newline anchor with sed and tr; you will need to change that to whatever domain you are using.

Upvotes: 0

Steve
Steve

Reputation: 54572

One way using :

awk '{ printf("%s%s", $0, /^[0-9]/ ? "" : "\n") }' file.txt

Upvotes: 5

KamilCuk
KamilCuk

Reputation: 141900

sed '/^[0-9]/{H;d};H;s/.*//;x;s/\n//g'
  • /^[0-9]/ - If the line starts with a digit.
    • H - Append the line to hold space.
    • d - Delete the line and start over.
  • If the line does not start with a digit
  • H - Append the line to hold space to all the digits there.
  • s/.*// - Clear pattern space. I want to clear hold space.
  • x - Switch pattern space with hold space.
  • s/\n//g - Replace all the newlines by nothing.
  • And here the line with the numbers are printed.

Upvotes: 3

Related Questions