Lucile Ter-Minassian
Lucile Ter-Minassian

Reputation: 95

sed RE error: repetition-operator operand invalid

When I try to run the following .sh code on my MacOS I get an error: "repetition-operator operand invalid"

Sounds like two repetition operators occur consecutively but I don't see that in regex. Maybe ".+?" is considered invalid syntax? Surprising

A code snippet:

export REGEX_DATETIME_DIFF="s/DATETIME_DIFF\((.+?),\s?(.+?),\s?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATETIME_DIFF(\1, \2, '\3')/g"
export REGEX_SCHEMA='s/`physionet-data.(mimiciii_clinical|mimiciii_derived|mimiciii_notes).(.+?)`/\2/g'
export CONNSTR='-d mimic'

{ echo "${PSQL_PREAMBLE}; DROP TABLE IF EXISTS code_status; CREATE TABLE code_status AS "; cat code_status.sql; } | sed -E -e "${REGEX_DATETIME_DIFF}" | sed -E -e "${REGEX_SCHEMA}" | psql ${CONNSTR}` 

Any clue?

Upvotes: 5

Views: 3842

Answers (3)

Laurent Carrié
Laurent Carrié

Reputation: 21

Installing the gnu version of sed fixed the problem for me, as I really need to use the (.*?) syntax

brew install gnu-sed
export PATH="/opt/homebrew/opt/gnu-sed/libexec/gnubin:$PATH"

Upvotes: 2

Lucile Ter-Minassian
Lucile Ter-Minassian

Reputation: 95

@tshiono Sorry, will post the error messages I get as it is easier. This is just a snippet to give you intuition, there are others (esp. as minute hour are redundant columns)

many thanks for the help, much appreciated!

LINE 47: ...          DATETIME_DIFF(CHARTTIME, charttime_lag, MINUTE)/60
                                                                                                                                                ^
LINE 153: ...when DATETIME_DIFF(charttime, charttime_prev_row, HOUR) <= 2
                                                               ^
ERROR:  column "hour" does not exist
LINE 192:   , DATETIME_DIFF(endtime, starttime, HOUR) AS duration_hour...
                                                                                                                ^
LINE 189:   , DATETIME_DIFF(endtime, starttime, HOUR) AS duration_hour...
                                                ^
LINE 192:   , DATETIME_DIFF(endtime, starttime, HOUR) AS duration_hour...
                                                ^
sed: 1: "-e
": invalid command code -

ERROR:  relation "elixhauser_quan" does not exist
LINE 101: from  elixhauser_quan;
                ^
ERROR:  relation "ventilation_durations" does not exist
LINE 14: left join ventilation_durations vd
ERROR:  column "second" does not exist
LINE 49:   and DATETIME_DIFF(ce.charttime, ie.intime, SECOND) > 0
                                                      ^
ERROR:  relation "kdigo_creatinine" does not exist
LINE 27:   FROM kdigo_creatinine cr
                ^
ERROR:  relation "vitals_first_day" does not exist
LINE 51: left join vitals_first_day v
                   ^

Upvotes: 1

tshiono
tshiono

Reputation: 22022

The regex (.+?), matches a shortest substring followed by a comma and can be substituted with ([^,]+),. It is same with (.+?)`. Then would you modify the 1st two lines as:

export REGEX_DATETIME_DIFF="s/DATETIME_DIFF\(([^,]+),\s?([^,]+),\s?(DAY|MINUTE|SECOND|HOUR|YEAR)\)/DATETIME_DIFF(\1, \2, '\3')/g"
export REGEX_SCHEMA='s/`physionet-data.(mimiciii_clinical|mimiciii_derived|mimiciii_notes).([^`]+)`/\2/g'

Upvotes: 4

Related Questions