Reputation: 11
I'm really close. I read through "number of space between each word" and it does provide this line:
counts = [(len(list(cpart))) for c,cpart in groupby(s) if c == ' ']
but I really don't understand it... I understand, or am assuming, C is the delimiter, S is what you're grouping by, and you're putting the resulting list?(new to python, array?) into counts (S is referent to a previously instantiated variable)
AMOUNT DATE
NAME ACCOUNT# DISCOUNT DUE DUE
I am creating a program that allows me to look at a randomly created COBOL output file headers and use it to create the PIC(X)'s associated.
the important parts are the numbers. I can determine lengths of strings obviously, but the spaces i'm not sure how...
Here is what I have so far to show i'm working lol:
from itertools import groupby
from test.test_iterlen import len
from macpath import split
from lib2to3.fixer_util import String
file = open("C:\\Users\\Joshua\\Desktop\\Practice\\cobol.cbl", 'r+')
line1 = file.readline()
split = line1.split()
print (split)
print ()
counts = [(len(list(cpart))) for c,cpart in groupby(split) if c == ' ']
print (counts)
index = 0
while index != split.__len__():
if split[index].strip() != None:
print ("PICX(" + ") VALUE " + "\"" + split[index] + "\".")
elif counts[index] == None:
print ("PICX(" + ") VALUE " + "\"" + split[index] + "\".")
index+=1
Upvotes: 1
Views: 2463
Reputation: 13076
There's no particular point in breaking up the output like that. You coould:
05 FILLER (optional) PIC X(width-of-report) VALUE
" AMOUNT DATE "(in column 72)
- ".
The "-" is in column 7, and shows the continuation of an alphanumeric literal, which needs no opening quote, but needs a closing quote.
Your processing to create that is very simple. You always output those three lines, all you have to do is "chop" your data into 59 bytes (for the second line) and "the rest" (not knowing your report width) for the third line.
Upvotes: 0
Reputation: 46578
I'll begin by explaining the first line:
counts = [(len(list(cpart))) for c,cpart in groupby(s) if c == ' ']
s
is actually the input string. So, to run this you'd start with:
s = " NAME ACCOUNT# DISCOUNT DUE DUE"
groupby(s)
returns an iterator of tuples. The first value in that tuple is the character from the input string, and the second value is another (nested) iterator that will iterate through the repeated values of the character. Put into list
form (for illustration) it would look like this:
groupby("hello!!!")
[('h', ['h']), ('e', ['e']), ('l', ['l', 'l']), ('o', ['o']), ('!', ['!', '!', '!'])]
So, c
is not a delimiter, but it's the variable that holds each character in the string s
, and cpart
is the iterator through all the consecutive values of c
. Once you call len(cpart)
it gives a list of [c,c,c,...]
(each item is the same!) and the length of that list is the number of times that the character c
is repeated. Normally it will just be one. For example, for the 'A'
in 'NAME
' you'll get c == A
and list(cpart) == ['A']
. But for the spaces between NAME
and ACCOUNT#
, you'll get c == ' '
and cpart == [' ',' ',' ',' ',' ',' ',' ',' ',' ',' ']
.
The whole thing being inside brackets []
means that it generates a list as if you were appending to a list within a for
loop, and the value of each item is the expression before the for
. Here, it's the len(list(cpart))
which counts the length of that list of repeated instances of a character. Thus, it'll be a list with the numbers of times a character is repeated. The if c == ' '
means that item will be added to the list only when that character is a space.
The above will count the spaces. To count the words (e.g., to get PIC X(6) VALUE "AMOUNT") you can simply do something like:
word_counts = [ len(word) for word in s.split() ]
where split
(which you have used) returns a list of words that had been previously one string separated by spaces.
Upvotes: 3