Reputation: 60674
I have a data set that I cut-n-pasted from a Google Spreadsheet into my text editor (Sublime Text 2), and the data set doesnt' quite match my needs for processing.
In the form that it has coming from the spreadsheet, the data starts with one line of strings, one for each column, and then a number of rows with data; in the data rows, each column has either the value 1
or is blank. I don't know if the data is tab separated when it comes from the spreadsheet, but after pasting it in the text file it is not. If the last 1
in a row is not in the last column, the line is padded with spaces up until but not including the last column.
I tried doing something with awk
, but I couldn't figure out how to tackle the fact that space is both separator and column value. Next, I tried a few commands with sed
, including replacing repeated spaces with zeros and piping to another sed
which replaced 10
with 1 0
, but then I sometimes got extra zeros inserted and I don't know where in the respective rows that happened.
This is some example data (there are 13 columns in the real file). I've added $
as the character after the last one on the line, so you can see how far the lines are padded.
"1" "2" "3" "4" "1" "2" "3" "4"
1 1 $ 0 1 1 0
1 1 $ 1 0 0 1
1 $ 0 1 0 0
1 1 1 $ 1 1 0 1
I would like to end up with something like the right (and then I don't care about where the line ends) so I can process it with awk
.
And by the way, I have seen this question, which doesn't solve my problem since the solution there is based on the fact that the file is tab-delimited, with no value at all in the "empty" cells. To reiterate, my file is space-delimited, with spaces in the empty cells.
Upvotes: 1
Views: 6672
Reputation: 7610
My first attempt was not ok. So my 2nd 3rd 4th try based on the modified input with determining the number of columns automatically:
awk 'NR==1{for(;N<NF;++N)sp=" 0"sp}NR>1{$0=" "$0;sub(" +$","");gsub(" "," 0");$0=substr($0sp,2,2*N-1)}1'<<EOT
"1" "2" "3" "4"
1 1
1 1
1
1 1 1
EOT
First spaces are even, in between ones are odd, so I added a space at the beginning to let to use the same gsub for both cases. It is not clear how many trailing spaces present, so the script just chomp them. It contains the number of 0
number of field times. Substr
start from 2 to cut the added leading space, and lasts to (number of fields)*2-1
characters to cut the trailing space.
Output:
"1" "2" "3" "4"
0 1 1 0
1 0 0 1
0 1 0 0
1 1 0 1
Upvotes: 1
Reputation: 2615
sed 's/ /0/g;s/10/1 /g;s/00/0 /g;s/$/0 /' datafile.txt | cut -c 1-7
Upvotes: 0
Reputation: 60674
Trying to explain why a problem is hard is good for your chances to solve it. Just because I thought about the explanation here, I also came up with a solution =)
The solution works with sed
and basically in three steps:
Replace all empty first columns with 0:
cat datafile.txt | sed 's/^ /0 /g'
Replace all empty last columns with 0:
cat datafile.txt | sed 's/^ /0 /g' | sed 's/ $/ 0/g'
Here, I had to experiment a little with the number of spaces in the regex, to align all the new zeros right.
Replace all empty inner columns with 0:
cat datafile.txt | sed 's/^ /0 /g' | sed 's/ $/ 0/g' | sed 's/ / 0/g'
Here I also experimented with placing the 0 first or last in the replacement regex to get it right.
And of course, after this is done I redirect output to a file by tagging on > datafile-clean.txt
at the end.
There's probably a more elegant way to do this, so if you have one, please do post it even though I personally don't need the solution anymore.
Update: As shown in comments, this solution can be improved quite a lot. I'll leave the original solution up here, since I think it's more clear what it does, and in what order, but this should probably be used instead.
To start with, we don't need so many pipes; instead, we use the -e
flag on sed
:
sed -e 's/^ /0 /' -e 's/ $/ 0/' -e 's/ / 0/g' datafile.txt
This works just as-is, given that the first row with column headers doesn't have any double-spaces. If it does, one can just read the file instead with tail -n +2 datafile
and pipe to the above sed
command.
Upvotes: 0