user2456216
user2456216

Reputation: 142

filter file with specift words linux

I have files in which the first column is an ID, and the second column is an option, like this:

$ cat file.txt
 1234;m11
 6758;m11;m14
 8796;mm14
 0303;m11

and I need to create a files with the IDs depending with the option. It is to say:

file_m11.txt => (1234,0303)
file_m11_m14 => (6758)
file_mm14 => (8796)

I try with cat file.txt | grep -w "option" > file_option but the problem is that the files are mutually exclusive, and the result is

file_m11.txt => (1234,0303,*6758*)
file_m11_m14 => (6758)
file_mm14 => (8796,*6758*)

Since I do in order that this does not happen, (the options can be change the name)

Upvotes: 1

Views: 68

Answers (2)

tink
tink

Reputation: 15213

Not sure whether I fully understood the question (see comment above), but here goes.

If you save the below as e.g. split.awk

{
  a=gensub(/^([^;]+).*/,"\\1",1);
  file=gensub(/[0-9]+;(.*)/,"\\1",1);
  gsub(/;/,"_",file); 
  store[file] = ""store[file]""a","
}
END{
  for( options in store ){
    gsub( /^/, "(", store[options])
    gsub( /,$/, ")", store[options])
    print store[options]  >> "file_"options".txt"
  }
}

And run it like so:

awk -f split.awk file.txt

This will create:

-rw-rw-r-- 1 tink   tink     7 2015-05-19 08:29 file_mm14.txt
-rw-rw-r-- 1 tink   tink    12 2015-05-19 08:29 file_m11.txt
-rw-rw-r-- 1 tink   tink     7 2015-05-19 08:29 file_m11_m14.txt

With the content as indicated above.

Upvotes: 2

Eric Renouf
Eric Renouf

Reputation: 14510

If m11;m14 is a single "option" you could modify your grep like

grep -P '^\s*\d+;option$' file > file_option

-P uses perl style regex, which is often nicer to look at and easier to work with. Then the regular expression looks for a line that starts with 0 or more spaces (or tabs), then some digits the a semicolon then your option and the end of the line. So m14 won't match m11;m14 because the start of the line doesn't match the pattern, and m11 won't match m11;m14 because the end of the line won't match.

It won't put the parens or put everything on the same line as in your examples, but your attempt at the command won't do that either, so I'm assuming that isn't actually important right now.

Upvotes: 1

Related Questions