Reputation: 5127
I am trying to create a list based on the input below, and I don't see the expected output. Can anyone suggest where am I going wrong?
INPUT:
CR FA CL Title
409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
415560 WLAN 656886 To Record SMD Event Logging
I want an OUTPUT like
[['CR', 'FA', 'CL', 'TITLE'], ['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario'], ['415560', 'WLAN', '656886','To Record SMD Event Logging']]
But i see its getting created like
[['CR', 'FA', 'CL', 'TITLE'], ['', '409452', 'WLAN', '656885\tAge out RSSI values from buffer in Beacon miss scenario'], ['', '415560', 'WLAN', '656886\tTo Record SMD Event Logging']]
Python code
for i in info.splitlines():
index = re.split(r'\W+',i,3)
CRlist.append(index)
Upvotes: 2
Views: 76
Reputation: 29093
if you have \t as separator than you can use this(note that you can use strip and check if item.strip() to check whether is the empty entry or not and skip it if so):
info = """
CR FA CL Title
409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
415560 WLAN 656886 To Record SMD Event Logging
"""
[[x.strip() for x in row.split('\t') if x.strip()] for row in info.split('\n')]
if you have multiple spaces beetween columns you can try ti use this:
[[x.strip() for x in row.split(' ') if x.strip()] for row in info.split('\n')]
or combined:
[[x.strip() for x in row.replace('\t', ' ').split(' ') if x.strip()] for row in info.split('\n')]
and finally using split(None, 3):
[row.split(None, 3) for row in info.split('\n')]
Upvotes: 0
Reputation: 362587
I have a feeling you should rather be using csv
module, but here is a non-regex solution:
>>> s = '''CR FA CL Title
... 409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
... 415560 WLAN 656886 To Record SMD Event Logging'''
>>> [x.strip().split(None, 3) for x in s.splitlines()]
[['CR', 'FA', 'CL', 'Title'], ['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario'], ['415560', 'WLAN', '656886', 'To Record SMD Event Logging']]
Upvotes: 1
Reputation: 365697
The output you're getting is exactly what you'd expect if there were extra whitespace at the start of each line but the first.
One common reason for this is that you've tried parsing files with the wrong line endings, without using universal-newlines mode, and just gotten things hopelessly confused.
For example, these two lines may look identical in your text editor:
409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
\r409452 WLAN 656885 Age out RSSI values from buffer in Beacon miss scenario
But your re.split
will do very different things with them:
['409452', 'WLAN', '656885', 'Age out RSSI values from buffer in Beacon miss scenario']
['', '409452', 'WLAN', '656885\tAge out RSSI values from buffer in Beacon miss scenario']
The solution is to strip off the excess whitespace. You can try to write a more complicated regexp, or just do re.split(r'\W+', s.lstrip(), 3)
.
Since you mentioned wanting to strip trailing whitespace as well, use strip
instead of lstrip
: re.split(r'\W+', s.strip(), 3)
.
But I'm not sure why you're using regexp in the first place, when you could just do s.strip().split(None, 3)
.
Upvotes: 2