Reputation: 1687
I have a file with a bunch of text in it, separated by newlines:
ex.
"This is sentence 1.\n"
"This is sentence 2.\n"
"This is sentence 3. It has more characters then some other ones.\n"
"This is sentence 4. Again it also has a whole bunch of characters.\n"
I want to be able to use some set of command line tools that will, for each line, count the number of characters in each line, and then, if there are more than X characters per that line, split on periods (".") and then count the number of characters in each element of the split line.
ex. of final output, by line number:
1. 24
2. 24
3. 69: 20, 49 (i.e. "This is sentence 3" has 20 characters, "It has more characters then some other ones" has 49 characters)
wc
only takes as input a file name, so I'm having trouble directing it it to take in a text string to do character count on
head -n2 processed.txt | tr "." "\n" | xargs -0 -I line wc -m line
gives me the error: ": open: No such file or directory"
Upvotes: 3
Views: 79
Reputation: 207853
awk is perfect for this. The code below should get you started and you can work out the rest:
awk -F. '{print length($0),NF,length($1)}' yourfile
Output:
23 2 19
23 2 19
68 3 19
70 3 19
It uses a period as the field separator (-F.), prints the length of the whole line ($0), the number of fields (NF), and the length of the first field ($1).
Here is another little example that prints the whole line and the length of each field:
awk -F. '{print $0;for(i=0;i<NF;i++)print length($i)}' yourfile
"This is sentence 1.\n"
23
19
"This is sentence 2.\n"
23
19
"This is sentence 3. It has more characters then some other ones.\n"
68
19
44
"This is sentence 4. Again it also has a whole bunch of characters.\n"
70
19
46
By the way, "wc" can process strings sent to its stdin like this:
echo -n "Hello" | wc -c
5
Upvotes: 2
Reputation: 384234
How about:
head -n2 processed.txt | tr "." "\n" | wc -m line
You should understand better what xargs
does and how pipes work. Do google for a good tutorial on those before using them =).
xargs
passes each line separately to the next utility. This is not what you want: you want wc
to get all the lines here. So just pipe the entire output of tr
to it.
Upvotes: 0