Max Song
Max Song

Reputation: 1687

Doing multi-staged text manipulation on the command line?

I have a file with a bunch of text in it, separated by newlines:

ex.

"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"

I want to be able to use some set of command line tools that will, for each line, count the number of characters in each line, and then, if there are more than X characters per that line, split on periods (".") and then count the number of characters in each element of the split line.

ex. of final output, by line number:

1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)

wc only takes as input a file name, so I'm having trouble directing it it to take in a text string to do character count on

head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line

gives me the error: ": open: No such file or directory"

Upvotes: 3

Views: 79

Answers (2)

Mark Setchell
Mark Setchell

Reputation: 207853

awk is perfect for this. The code below should get you started and you can work out the rest:

awk -F. '{print length($0),NF,length($1)}'   yourfile

Output:

23 2 19
23 2 19
68 3 19
70 3 19

It uses a period as the field separator (-F.), prints the length of the whole line ($0), the number of fields (NF), and the length of the first field ($1).

Here is another little example that prints the whole line and the length of each field:

awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46

By the way, "wc" can process strings sent to its stdin like this:

echo -n "Hello" | wc -c
5

Upvotes: 2

How about:

head -n2 processed.txt | tr "." "\n" | wc -m line

You should understand better what xargs does and how pipes work. Do google for a good tutorial on those before using them =).

xargs passes each line separately to the next utility. This is not what you want: you want wc to get all the lines here. So just pipe the entire output of tr to it.

Upvotes: 0

Related Questions