user1795998
user1795998

Reputation: 5127

List getting created incorrectly

I am trying to create a list based on the input below, and I don't see the expected output. Can anyone suggest where am I going wrong?

INPUT:

CR  FA  CL  Title
409452  WLAN    656885  Age out RSSI values from buffer in Beacon miss scenario
415560  WLAN    656886  To Record SMD Event Logging

I want an OUTPUT like

[['CR', 'FA', 'CL', 'TITLE'], ['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario'], ['415560', 'WLAN', '656886','To Record SMD Event Logging']]

But i see its getting created like

[['CR', 'FA', 'CL', 'TITLE'], ['', '409452', 'WLAN', '656885\tAge out RSSI values from buffer in Beacon miss scenario'], ['', '415560', 'WLAN', '656886\tTo Record SMD Event Logging']]

Python code

 for i in info.splitlines():
    index = re.split(r'\W+',i,3)
    CRlist.append(index)

Upvotes: 2

Views: 76

Answers (3)

Artsiom Rudzenka
Artsiom Rudzenka

Reputation: 29093

if you have \t as separator than you can use this(note that you can use strip and check if item.strip() to check whether is the empty entry or not and skip it if so):

info = """
          CR  FA  CL  Title
          409452  WLAN    656885  Age out RSSI values from buffer in Beacon miss scenario
          415560  WLAN    656886  To Record SMD Event Logging
       """
[[x.strip() for x in row.split('\t') if x.strip()] for row in info.split('\n')]

if you have multiple spaces beetween columns you can try ti use this:

[[x.strip() for x in row.split('  ') if x.strip()] for row in info.split('\n')]

or combined:

[[x.strip() for x in row.replace('\t', '  ').split('  ') if x.strip()] for row in info.split('\n')]

and finally using split(None, 3):

[row.split(None, 3) for row in info.split('\n')]

Upvotes: 0

wim
wim

Reputation: 362587

I have a feeling you should rather be using csv module, but here is a non-regex solution:

>>> s = '''CR  FA  CL  Title
... 409452  WLAN    656885  Age out RSSI values from buffer in Beacon miss scenario
... 415560  WLAN    656886  To Record SMD Event Logging'''
>>> [x.strip().split(None, 3) for x in s.splitlines()]
[['CR', 'FA', 'CL', 'Title'], ['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario'], ['415560', 'WLAN', '656886', 'To Record SMD Event Logging']]

Upvotes: 1

abarnert
abarnert

Reputation: 365697

The output you're getting is exactly what you'd expect if there were extra whitespace at the start of each line but the first.

One common reason for this is that you've tried parsing files with the wrong line endings, without using universal-newlines mode, and just gotten things hopelessly confused.

For example, these two lines may look identical in your text editor:

409452  WLAN    656885  Age out RSSI values from buffer in Beacon miss scenario
\r409452  WLAN    656885  Age out RSSI values from buffer in Beacon miss scenario

But your re.split will do very different things with them:

['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario']
['', '409452', 'WLAN', '656885\tAge out RSSI values from buffer in Beacon miss scenario']

The solution is to strip off the excess whitespace. You can try to write a more complicated regexp, or just do re.split(r'\W+', s.lstrip(), 3).

Since you mentioned wanting to strip trailing whitespace as well, use strip instead of lstrip: re.split(r'\W+', s.strip(), 3).

But I'm not sure why you're using regexp in the first place, when you could just do s.strip().split(None, 3).

Upvotes: 2

Related Questions