Reputation: 24393
I have a CSV file, file1.csv
, which has a custom format, with three columns, like this:
This is some data. [text] This is some more data.
[
is in the first column.]
is in the third column, no matter what content follows.E.g.:
First. [second] Third.
^ ^
I want to sort the lines of the file into two files, withnumbers.csv
and withoutnumbers.csv
, essentially, by those containing numbers within the third column, and those not containing numbers within the third column.
Later square brackets might appear, but they are not regarded as a new column, they are still part of the third columns data, e.g.:
First. [second] Third. [some more text] This is still in the third column.
^ ^
Lines containing numbers can match them like *0*
, *1*
, *2*
, etc.. These all contain numbers:
Water is H20.
The bear ate 2,120 fish.
The Wright Flyer flew in 1903.
Numbers found anywhere within a pair of square brackets in the third column do not count as a match, e.g., these lines would be sent to withoutnumbers.csv
:
First. [second] Some text. [This has the number 1.]
First. [second] Some more text. [The Wright Flyer flew in 1903.]
These would be sent to withnumbers.csv
, because they still have a number outside of the square brackets, but inside the third column:
First. [second] Some text with 1. [This has the number 1.]
First. [second] Some more text with the number 3. [The Wright Flyer flew in 1903.]
How can I sort the lines of the file into those containing numbers in the third column, not considering those numbers found within square brackets, and those lines not containing numbers?
Upvotes: 1
Views: 288
Reputation: 360133
This splits on the first closing square bracket and checks for digits inside square brackets in the part of the line after the first closing square bracket or if that part consist solely of non-digits. It writes those lines to the withoutnumbers.csv. Otherwise, it writes the line to withnumbers.csv.
perl -lne 'BEGIN {open ND, ">", withoutnumbers.csv; open D, ">", withnumbers.csv} @fields = split(/]/,$_,2); $fields[1] =~ /\[.*?\d.*?\]|^\D+$/ ? print ND $_ : print D $_' file1.csv
Upvotes: 1
Reputation: 28029
Well, I'm not going to lie, I'm not loving the solution I came up with. However, your problem is rather peculiar and desperate times call for desperate measures. So, give this a try:
awk -F'\[[^\]]*\]' '{
printed = 0
for (i = 2; i <= NF; i++) {
if ($i ~ /[0-9]+/) {
print $0 >> "withNumbers"
printed = 1
break
}
}
if (! printed) {
print $0 >> "withoutNumbers"
}
}' file
Upvotes: 3
Reputation: 246837
Here's a go
shopt -s extglob
rm withnumbers.csv withoutnumbers.csv
touch withnumbers.csv withoutnumbers.csv
while IFS= read -r line; do
col3=${line#*\]} # remove everything before and including the first ]
col3=${col3//\[*([^]])\]/} # remove all bracketed items
if [[ $col3 == *[[:digit:]]* ]]; then
printf "%s\n" "$line" >> withnumbers.csv
else
printf "%s\n" "$line" >> withoutnumbers.csv
fi
done < file1.csv
Upvotes: 1