Reputation: 21
I'd like an AWK command that joins separate words:
1st part is in the end of the line, end up with "_".
2nd part is in the beginning of the next line.
(PS: there are some lines that have both the 2nd and 1st part like in the example below)
Example:
Bla bla bla bla SATU_
RDAY bla bla, bla bla
bla bla bla bla bla SUN_
DAY: bla bla bla bla M_
ONDAY. Bla bla bla bla TU_
ESDAY, bla bla bla.
Result:
Line 1: SATURDAY
Line 3: SUNDAY
Line 4: MONDAY
Line 5: TUESDAY
Upvotes: 0
Views: 170
Reputation: 203985
With GNU awk for multi-char RS:
$ awk -v RS='[[:alpha:]]+_\n[[:alpha:]]+' 'RT!=""{sub(/_\n/,"",RT); print RT}' file
SATURDAY
SUNDAY
MONDAY
TUESDAY
or with any awk:
$ awk 'w{w=w $1; gsub(/[^[:alpha:]]/,"",w); print w; w=""} /_$/{w=$NF}' file
SATURDAY
SUNDAY
MONDAY
TUESDAY
and if you really want the starting line numbers included then with any awk:
$ awk 'w{w=w $1; gsub(/[^[:alpha:]]/,"",w); printf "Line %d: %s\n", NR-1, w; w=""} /_$/{w=$NF}' file
Line 1: SATURDAY
Line 3: SUNDAY
Line 4: MONDAY
Line 5: TUESDAY
Upvotes: 1
Reputation: 6345
With GNU awk:
$ awk 'p{print gensub(/[^A-Z]$/,"","g",$1);p=0}/_$/{printf "%s",gensub("_","","g",$NF);p=1}' file
SATURDAY
SUNDAY
MONDAY
TUESDAY
Upvotes: 0
Reputation: 37424
$ awk 'p~/_$/{sub(/_$/,"",p);print "Line " (NR-1) ":", p $1}{p=$NF}' file
Line 1: SATURDAY
Line 3: SUNDAY:
Line 4: MONDAY.
Line 5: TUESDAY,
Upvotes: 2
Reputation: 439237
A POSIX-compliant solution:
awk '
firstPart != "" { sub(/[[:punct:]]$/, "", $1); print firstPart $1 }
$NF ~ /._$/ { firstPart=substr($NF, 1, length($NF) - 1); next }
{ firstPart= "" }
' file
Pattern (condition) firstPart != ""
is true only if a token of interest was found on the previous line and only then executes the associated action ({ ... }
):
sub(/[[:punct:]]$/, "", $1)
replaces (sub()
) a trailing ($
) instance of a punctuation character ([[:punct:]]
), if any, in the 1st field ($1
) with the empty string, thereby effectively removing it.
print firstPart $1
prints the direct concatenation of the token of interest from the previous line with the (modified) 1st field, simply by placing firstPart
and $1
next to each other, separated only by a space.
Pattern $NF ~ /._$/
tests if the last field ($NF
) ends in ($
) _
(preceded by at least 1 other character (.
)).
firstPart=substr($NF, 1, length($NF) - 1)
stores the contents of the last field except for the trailing _
in variable firstPart
.next
skips processing of the remainder of the script for the line at hand and moves to the next line.Action { firstPart= "" }
, because it is not preceded by a pattern, is processed unconditionally - if reached:
firstPart
signals to the next script cycle that nothing is to be printed for the next line.Upvotes: 0
Reputation: 2298
Not quite sure of all your requirements, but:
awk 'x {sub("[^A-Z].*", "", $1); print "Line "n": "x $1; x = ""}
sub("_$", "", $NF) {x = x $NF; n = NF}' input.txt
hth
Upvotes: 0