Ihe Onwuka
Ihe Onwuka

Reputation: 477

Matching non-whitespace in a field

In the data below I want to correctly distinguish the indented lines. Each line consists of 2 fields that are tab separated so each indented line starts with an invisible tab.

I would like to know why the following script that tests for non-whitespace in the first field does only prints the second and second last field of the data pasted below instead of all lines that are not indented. Suggestions for a solution are welcome but I would like to know what is wrong with what I wrote.

Here is the script

BEGIN {FS="\t"; OFS="\t"}
  /\s*(directors)\s*$/ {type=$1; next}
  $1~/\S/ {print}

Data.

directors
&Oumlzkul, Ahmet Salih  Ii 2013
'Abd Al-Hamid, Ja'far   A Two Hour Delay 2001
    Badgeless sur la Croisette 2012
    Just Outside the Frame: The Profilmic Event and Beyond 2008
    Mesocafe 2009
    Mesocafé 2011
'D.J'Arlia, Domenic She'll Never Know 2012
    Cantarella 2011
    Makhno Beer 2010
'Kid Niagara' Kallet, Harry Drug Demon Romance 2012
'Kusare, Mak (I)    Baby Beautiful 2013/II
    Comrade 2008
'Kusare, Mak (II)   A Play Called a Temple Made of Clay 2014
'Legend' Spivey, Larry  The Crime City Diaries: Entry 1 - Crooked 2012
'Noble Julz'Hamilton, Ulia  Church Hurt 2015

Upvotes: 1

Views: 1548

Answers (1)

anubhava
anubhava

Reputation: 784998

Use posix regex properties for space rather than PCRE \s or \S:

awk 'BEGIN {FS=OFS="\t"}
   /[[:space:]]*directors[[:space:]]**$/ {type=$1; next}
   $1~/[^[:space:]]/' file

Note use of [[:space:]] instead of \s and [^[:space:]] instead of \S.

Upvotes: 4

Related Questions