Reputation: 501
If I have a csv similar to the following:
1999, random strings go here, £50.00, 983903893
1957, Another lacklustre line, £99.00, 3983093
1987, Adventure UK, £83.83, 39939
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
How can I delete line where field 2 has 3 or less words. So the output would be:
1999, random strings go here, £50.00, 983903893
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
I'm tempted to say I would like to use sed because it can change the file in place witout the need to create a new file.
I know how to refer to the 2nd field and remove individual words or symbols with sed from a csv. For example I could use this to remove a question mark:
sed -ri ':b s/^([^,]*,[^,]*)\?/\1 /g; t b'
But how do I tell said that I want to remove the entire line if there are three or less words? This is the bit i'm struggling with. Thanks.
EDIT: Here is an actual snippet of the file, please refer to this rather than my original example- it's in the same format but different data which seems to have affected how the suggested sed line in Jean-François Fabre answer processes the data:
142106729748,Rocky Horror Book,http://www.ebay.co.uk/itm/Rocky-Horror-Book-/142106729748,0.99
162189532196,Total Film Issue 10,http://www.ebay.co.uk/itm/Total-Film-Issue-10-/162189532196,0.75
162189528365,Total Film Issue 9,http://www.ebay.co.uk/itm/Total-Film-Issue-9-/162189528365,0.99
172328113931,Captain America 163 Silver Age,http://www.ebay.co.uk/itm/Captain-America-163-Silver-Age-/172328113931,2.5
232069020935,Football Picture Story Monthly,http://www.ebay.co.uk/itm/Football-Picture-Story-Monthly-/232069020935,0.25
262606117082,The geographical Tradition ,http://www.ebay.co.uk/itm/geographical-Tradition-/262606117082,10.0
401182170339,Naruto Official Fanbook,http://www.ebay.co.uk/itm/Naruto-Official-Fanbook-/401182170339,3.0
Upvotes: 1
Views: 93
Reputation: 140287
sed does it (barely, but does it)
sed -r '/^[^,]+, (\w+[, ]){4,}/!d' text.txt
small explanation:
{4,}
matches 4 or more words separated by space or comma (next field)!d
command does not delete those (delete non-matching ones)-r
option or some stuff won't work like \w
result:
1999, random strings go here, £50.00, 983903893
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
(well I had to fix the pound sign by hand actually :))
Edit: better just in case a TAB finds its way in the file (thanks potong)
sed -r '/^[^,]*,([[:space:]]+[^[:space:],]+){4}/!d'
Upvotes: 3
Reputation: 204558
sed is for simple substitutions on individual lines, that is all. For anything else you should be using awk:
$ awk -F' *, *' 'split($2,t,/ */)>3' file
1999, random strings go here, £50.00, 983903893
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
Upvotes: 0
Reputation: 42127
With awk
:
awk -F ', ' '{split($2, arr, " "); if(length(arr)>=4) print}' file.txt
split($2, arr, " ")
creates array arr
by splitting the second field on space
if(length(arr)>=4) print
prints the record only if the length of the array is >=4
Example:
% cat file.txt
1999, random strings go here, £50.00, 983903893
1957, Another lacklustre line, £99.00, 3983093
1987, Adventure UK, £83.83, 39939
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
% awk -F ', ' '{split($2, arr, " "); if(length(arr)>=4) print}' file.txt
1999, random strings go here, £50.00, 983903893
1945, North Wales is the Adrenaline Capital of Europe, £78.99, 83983
Upvotes: 1
Reputation: 242333
Perl solution:
perl -waF, -i~ -ne 'print if 3 < split " ", $F[1]' -- text.txt
-w
turns on warnings-n
reads the input line by line-a
splits each input line into the @F array-F
tells Perl how to split it, in this case it uses a comma-i
modifies the file in place, ~
will be used as a suffix for the backupsplit in scalar context returns the number of fields.
Upvotes: 2