Reputation:

How to parse this string?

I have a string like the below string:

>>> string = """00 1f 00@ 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/
 00) 00 1f 00 1c 00  00  00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff 
e0 ff da ff d2 ff cc ff c9 ff c5 ff c3 ff c3 ff c1 ff ba ff b3 ff b1 ff b2 
ff b3 ff b3 ff b6 ff ba ff b7 ff ab ff 9c ff 93 ff 93 ff 9c ff a5 ff aa ff 
aa ff a3 ff 9a ff 92 ff 95 ff a3 ff b2 ff b9 ff bd ff cb ff df ff e8 ff dd 
ff ca ff c8 ff d9 ff eb ff f1 ff ee ff f0 ff fe ff 10 00 1d 00 1f 00 18 00 
0f 00 0e 00 17 00" 00\' 00% 00\' 001 00= 00D 00G 00I 00M 00O 00N 00M 00M 00O 
00N 00K 00F 00C 00F 00M 00M 00@ 00. 00$ 00& 00* 00& 00 1d 00 17 00 17 00 17 
00 13 00 0f 00 13 00 1f 00* 00/ 00/ 00/ 00/ 00/ 00- 00\' 00 1f 00 18 00 13 
00 10 00 0c 00 08 00 03 00 00 00 fd ff f8 ff f3 ff f1 ff ee ff e8 ff e1 ff 
e1 ff e9 ff f0 ff f0 ff e9 ff e4 ff e1 ff dd ff da ff d8 ff d9 ff d7 ff d2 
ff cc ff c9 ff ca ff ca ff c9 ff c9 ff cd ff d1 ff d3 ff d1 ff d0 ff d1 ff
 d5 ff d9 ff dd ff e1 ff e6 ff e9 ff e9 ff e9 ff ec ff f1 ff f5 ff f8 ff fc 
ff 01 00 03 00 00 00 fb ff f9 ff fc ff 00 00 01 00 01 00 00 00 ff ff ff ff 
01 00 01 00 ff ff fb ff f9 ff fb ff ff ff 01 00 01 00 00 00 ff ff fe ff 00 
00 0b 00 17 00 1c 00 18 00 11 00 0f 00 11 00 10 00 0c 00 08 00 07 00 08 00\t 
00 08 00 04 00 00 00 fc ff f8 ff f7 ff f8 ff f9 ff"""

As you see, it almost consist of two-letter groups. But rarely we have groups of more than or less than 2 letters. I want to write a program to give me those groups or those exceed letters. for example I want to give me@ c e N > o F 2 & * / for the first line. I wondering how can I do that?

I have already tried the below program:

for word in string:
    if (len(word) !=2):
        print(word[2:], end =' ')

but it seems that it doesn't work fine for me!

Upvotes: 0

Answers (7)

gl051

Reputation: 569

You can tokenize the string, then using the filter function to get only the words that are longer than 2, then you extract the extra characters.

[x[2:] for x in filter(lambda x: len(x) > 2,string.split())]

Output:

['@', 'c', 'e', 'N', '>', 'E', 'O', 'F', '2', '&', '*', '/', ')', '"', "'", '%', "'", '1', '=', 'D', 'G', 'I', 'M', 'O', 'N', 'M', 'M', 'O', 'N', 'K', 'F', 'C', 'F', 'M', 'M', '@', '.', '$', '&', '*', '&', '*', '/', '/', '/', '/', '/', '-', "'"]

Upvotes: 0

albert

Reputation: 8623

#!/usr/bin/env python3
# coding: utf-8

s = """00 1f 00@ 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/
 00) 00 1f 00 1c 00  00  00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff 
e0 ff da ff d2 ff cc ff c9 ff c5 ff c3 ff c3 ff c1 ff ba ff b3 ff b1 ff b2 
ff b3 ff b3 ff b6 ff ba ff b7 ff ab ff 9c ff 93 ff 93 ff 9c ff a5 ff aa ff 
aa ff a3 ff 9a ff 92 ff 95 ff a3 ff b2 ff b9 ff bd ff cb ff df ff e8 ff dd 
ff ca ff c8 ff d9 ff eb ff f1 ff ee ff f0 ff fe ff 10 00 1d 00 1f 00 18 00 
0f 00 0e 00 17 00" 00\' 00% 00\' 001 00= 00D 00G 00I 00M 00O 00N 00M 00M 00O 
00N 00K 00F 00C 00F 00M 00M 00@ 00. 00$ 00& 00* 00& 00 1d 00 17 00 17 00 17 
00 13 00 0f 00 13 00 1f 00* 00/ 00/ 00/ 00/ 00/ 00- 00\' 00 1f 00 18 00 13 
00 10 00 0c 00 08 00 03 00 00 00 fd ff f8 ff f3 ff f1 ff ee ff e8 ff e1 ff 
e1 ff e9 ff f0 ff f0 ff e9 ff e4 ff e1 ff dd ff da ff d8 ff d9 ff d7 ff d2 
ff cc ff c9 ff ca ff ca ff c9 ff c9 ff cd ff d1 ff d3 ff d1 ff d0 ff d1 ff
 d5 ff d9 ff dd ff e1 ff e6 ff e9 ff e9 ff e9 ff ec ff f1 ff f5 ff f8 ff fc 
ff 01 00 03 00 00 00 fb ff f9 ff fc ff 00 00 01 00 01 00 00 00 ff ff ff ff 
01 00 01 00 ff ff fb ff f9 ff fb ff ff ff 01 00 01 00 00 00 ff ff fe ff 00 
00 0b 00 17 00 1c 00 18 00 11 00 0f 00 11 00 10 00 0c 00 08 00 07 00 08 00\t 
00 08 00 04 00 00 00 fc ff f8 ff f7 ff f8 ff f9 ff"""


for grp in s.split():
    if len(grp) != 2:
        print(grp[2:])

Updated version to additionally find groups shorter than two chars:

found_because_longer_than_two_chars = list()
found_because_shorter_than_two_chars = list()

for grp in s.split():
    if len(grp) > 2:
        found_because_longer_than_two_chars.append(grp[2:])
        print(grp[2:])
    elif len(grp) < 2:
        found_because_shorter_than_two_chars.append(grp)
        print(grp)

print(found_because_longer_than_two_chars)
print(found_because_shorter_than_two_chars)

Upvotes: 0

mhawke

Reputation: 87134

It's not entirely clear to me whether you also want single character strings included in the output. From your question:

"But rarely we have groups of more than or less than 2 letters."

You mention them as a possibility although there are none present in your example data. So, if you do want single character strings then this will do the job:

for word in s.split():
    length = len(word)
    if length > 2:
        print word[2:],
    elif length < 2:
        print word

Or, as a list comprehension

[word[2 if len(word)>2 else 0:] for word in s.split() if len(word) != 2]

on which you can join the elements if desired.

Upvotes: 0

jhoepken

Reputation: 1858

Split the string by a space and check which list elements are longer than 2.

string = """00 1f 00@ 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/
 00) 00 1f 00 1c 00  00  00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff 
e0 ff da ff d2 ff cc ff c9 ff c5 ff c3 ff c3 ff c1 ff ba ff b3 ff b1 ff b2 
ff b3 ff b3 ff b6 ff ba ff b7 ff ab ff 9c ff 93 ff 93 ff 9c ff a5 ff aa ff 
aa ff a3 ff 9a ff 92 ff 95 ff a3 ff b2 ff b9 ff bd ff cb ff df ff e8 ff dd 
ff ca ff c8 ff d9 ff eb ff f1 ff ee ff f0 ff fe ff 10 00 1d 00 1f 00 18 00 
0f 00 0e 00 17 00" 00\' 00% 00\' 001 00= 00D 00G 00I 00M 00O 00N 00M 00M 00O 
00N 00K 00F 00C 00F 00M 00M 00@ 00. 00$ 00& 00* 00& 00 1d 00 17 00 17 00 17 
00 13 00 0f 00 13 00 1f 00* 00/ 00/ 00/ 00/ 00/ 00- 00\' 00 1f 00 18 00 13 
00 10 00 0c 00 08 00 03 00 00 00 fd ff f8 ff f3 ff f1 ff ee ff e8 ff e1 ff 
e1 ff e9 ff f0 ff f0 ff e9 ff e4 ff e1 ff dd ff da ff d8 ff d9 ff d7 ff d2 
ff cc ff c9 ff ca ff ca ff c9 ff c9 ff cd ff d1 ff d3 ff d1 ff d0 ff d1 ff
 d5 ff d9 ff dd ff e1 ff e6 ff e9 ff e9 ff e9 ff ec ff f1 ff f5 ff f8 ff fc 
ff 01 00 03 00 00 00 fb ff f9 ff fc ff 00 00 01 00 01 00 00 00 ff ff ff ff 
01 00 01 00 ff ff fb ff f9 ff fb ff ff ff 01 00 01 00 00 00 ff ff fe ff 00 
00 0b 00 17 00 1c 00 18 00 11 00 0f 00 11 00 10 00 0c 00 08 00 07 00 08 00\t 
00 08 00 04 00 00 00 fc ff f8 ff f7 ff f8 ff f9 ff"""

data = string.split(" ")

output = []

for dI in data:
    if len(dI) > 2:
        output.append(dI)

print " ".join(output)

Alternatively you can shorten the for-loop to the following:

data = string.split(" ")

output = [dI for dI in data if len(dI) > 2]

print " ".join(output)

Even more compressed, this could look like:

print " ".join([dI for dI in string.split(" ") if len(dI) > 2])

Upvotes: 0

ZdaR

Reputation: 22964

You may simply do it in one line by using list comprehension.

sentence = r""" 00 1f 00@ 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/
 00) 00 1f 00 1c 00  00  00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff 
e0 ff da ff d2 ff cc ff c9 ff c5 ff c3 ff c3 ff c1 ff ba ff b3 ff b1 ff b2 
ff b3 ff b3 ff b6 ff ba ff b7 ff ab ff 9c ff 93 ff 93 ff 9c ff a5 ff aa ff 
aa ff a3 ff 9a ff 92 ff 95 ff a3 ff b2 ff b9 ff bd ff cb ff df ff e8 ff dd 
ff ca ff c8 ff d9 ff eb ff f1 ff ee ff f0 ff fe ff 10 00 1d 00 1f 00 18 00 """

print " ".join(word[2:] for word in sentence.split() if len(word)>2)

>>> @ c e N > E O F 2 & * / ) \r

But any approach would miss the escape characters such as \r so you need to define the string as raw string by simply purring a small r at the start of the multiline string .

Upvotes: 0

Harsha Biyani

Reputation: 7268

Try this

 >>> s = """00 1f 00@ 00c 00e 00N 00> 00E 00O 00F 002 00& 00* 00/
 00) 00 1f 00 1c 00  00  00 17 00\r 00 08 00 03 00 f8 ff ea ff e1 ff e1 ff 
e0 ff da ff d2 ff cc ff c9 ff c5 ff c3 ff c3 ff c1 ff ba ff b3 ff b1 ff b2 
ff b3 ff b3 ff b6 ff ba ff b7 ff ab ff 9c ff 93 ff 93 ff 9c ff a5 ff aa ff 
aa ff a3 ff 9a ff 92 ff 95 ff a3 ff b2 ff b9 ff bd ff cb ff df ff e8 ff dd 
ff ca ff c8 ff d9 ff eb ff f1 ff ee ff f0 ff fe ff 10 00 1d 00 1f 00 18 00 
0f 00 0e 00 17 00" 00\' 00% 00\' 001 00= 00D 00G 00I 00M 00O 00N 00M 00M 00O 
00N 00K 00F 00C 00F 00M 00M 00@ 00. 00$ 00& 00* 00& 00 1d 00 17 00 17 00 17 
00 13 00 0f 00 13 00 1f 00* 00/ 00/ 00/ 00/ 00/ 00- 00\' 00 1f 00 18 00 13 
00 10 00 0c 00 08 00 03 00 00 00 fd ff f8 ff f3 ff f1 ff ee ff e8 ff e1 ff 
e1 ff e9 ff f0 ff f0 ff e9 ff e4 ff e1 ff dd ff da ff d8 ff d9 ff d7 ff d2 
ff cc ff c9 ff ca ff ca ff c9 ff c9 ff cd ff d1 ff d3 ff d1 ff d0 ff d1 ff
 d5 ff d9 ff dd ff e1 ff e6 ff e9 ff e9 ff e9 ff ec ff f1 ff f5 ff f8 ff fc 
ff 01 00 03 00 00 00 fb ff f9 ff fc ff 00 00 01 00 01 00 00 00 ff ff ff ff 
01 00 01 00 ff ff fb ff f9 ff fb ff ff ff 01 00 01 00 00 00 ff ff fe ff 00 
00 0b 00 17 00 1c 00 18 00 11 00 0f 00 11 00 10 00 0c 00 08 00 07 00 08 00\t 
00 08 00 04 00 00 00 fc ff f8 ff f7 ff f8 ff f9 ff"""

>>> output=[]  #List to add output

>>> for single_word in s.split():
    if (len(single_word) > 2):
        output.append(single_word[2:])


>>> print (list(set(output))) #Removes the duplicates
['"', '%', '$', "'", '&', ')', '*', '-', '/', '.', '1', '2', '=', '>', '@', 'C', 'E', 'D', 'G', 'F', 'I', 'K', 'M', 'O', 'N', 'c', 'e']


>>> " ".join (list(set(output)))
'" % $ \' & ) * - / . 1 2 = > @ C E D G F I K M O N c e'

Upvotes: 0

Vivek Sable

Reputation: 10223

In [6]: result = []

In [7]: for i in input_str.split(" "):
   ...:     if len(i)>2:
   ...:         result.append(i[2:])
   ...:         
In [15]: " ".join(result)
Out[15]: '@ c e N > E O F 2 & * /\n ) \r 0 f a f f " \' % \' 1 = D G I M O N M M O 0N K F C F M M @ . $ & * & 0 * / / / / / - \' 0 1 f \n f 1 0 \t 0'

Upvotes: 1

How to parse this string?

Answers (7)

Related Questions