user9431057
user9431057

Reputation: 1253

Split by Space and add values - Python

I have a list like this,

sample_lsit = ['ST,PAT A V0068 04/18/19 07/02/19 54 7 0.00 70.42',
               'ST,PAT A V0068 04/18/19 07/02/19 54 8 0.00 70.42',
               'LK,LON J V0067 07/02/19 7 26 0.00 486.00',
               'LK,LON J V0074 07/02/19 7 28 0.00 194.00',
               'LN,BET W V0195 05/16/19 07/02/19 77 2 2.33 36.49',
               'LN,BET W V0195 05/16/19 07/02/19 77 3 2.38 33.16']

In values 3 and 4, dates are missing and that is how it is. I want None values in the position where dates are missing. I am trying to split each value in the list by space like this,

for i in sample_lsit:
    print(i.split(' '))

I am getting an output like this,

['ST,PAT', 'A', 'V0068', '04/18/19', '07/02/19', '54', '7', '0.00', '70.42']
['ST,PAT', 'A', 'V0068', '04/18/19', '07/02/19', '54', '8', '0.00', '70.42']
['LK,LON', 'J', 'V0067', '07/02/19', '7', '26', '0.00', '486.00']
['LK,LON', 'J', 'V0074', '07/02/19', '7', '28', '0.00', '194.00']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '2', '2.33', '36.49']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '3', '2.38', '33.16']

However, I need my output like this,

['ST,PAT', 'A', 'V0068', '04/18/19', '07/02/19', '54', '7', '0.00', '70.42']
['ST,PAT', 'A', 'V0068', '04/18/19', '07/02/19', '54', '8', '0.00', '70.42']
['LK,LON', 'J', 'V0067', None, '07/02/19', '7', '26', '0.00', '486.00']
['LK,LON', 'J', 'V0074', None, '07/02/19', '7', '28', '0.00', '194.00']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '2', '2.33', '36.49']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '3', '2.38', '33.16']

How can I achieve this? I have been searching for this split with space and add

Upvotes: 0

Views: 58

Answers (2)

wsidl
wsidl

Reputation: 106

This requires a unique approach since without any knowledge of the input, you can't determine what value is missing.

For these instances, it's probably good to have some tests to perform on each value to find that it meets the correct format. This can be done in a number of ways, but all rely on some kind of testing to validate the input.

Easiest is to create a method that will validate the input list and fill in values where needed. In your case, if you know the second date will always be provided then you can perform a check that two consecutive values are dates. If not, add a None in the 3rd index:

def test(input_list):
  try:
    datetime.strptime(input_list[3], "%m/%d/%y")
    datetime.strptime(input_list[4], "%m/%d/%y")
  except:
    input_list.insert(3, None)

The other option is use a schema validation library like voluptuous or good perform the check and fill in default values to fulfill your requirements.

import good

def test(input_list):
  schema = good.Scheme([
    good.All(str, good.Length(6), good.Match('[A-Z]{2},[A-Z]{3}')),
    good.All(str, good.Length(1), good.Match('[A-Z]')),
    good.All(str, good.Length(5), good.Match('[A-Z]\d{4}')),
    good.All(str, good.Length(8), good.Date('%d/%m/%y'), good.Default(None)),
    good.All(str, good.Length(8), good.Date('%d/%m/%y')), 
 ...
 ])
 schema(input_list)

Upvotes: 1

It isn't to hard, the nasty part is that you have three spaces for a missing entry not just two.

sample_list = ['ST,PAT A V0068 04/18/19 07/02/19 54 7 0.00 70.42',
               'ST,PAT A V0068   04/18/19 07/02/19 54 8 0.00 70.42',
               'LK,LON J V0067   07/02/19 7 26 0.00 486.00',
               'LK,LON J V0074 07/02/19 7 28 0.00 194.00',
               'LN,BET W V0195 05/16/19 07/02/19 77 2 2.33 36.49',
               'LN,BET W V0195 05/16/19 07/02/19 77 3 2.38 33.16']
result = [[x if x else None for x in line.replace('   ', '  ').split(' ')] for line in sample_list]
for line in result:
    print(line)

Output:

['ST,PAT', 'A', 'V0068', '04/18/19', '07/02/19', '54', '7', '0.00', '70.42']
['ST,PAT', 'A', 'V0068', None, '04/18/19', '07/02/19', '54', '8', '0.00', '70.42']
['LK,LON', 'J', 'V0067', None, '07/02/19', '7', '26', '0.00', '486.00']
['LK,LON', 'J', 'V0074', '07/02/19', '7', '28', '0.00', '194.00']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '2', '2.33', '36.49']
['LN,BET', 'W', 'V0195', '05/16/19', '07/02/19', '77', '3', '2.38', '33.16']

Since list comprehension can be confusing for beginners to python, that one line above is equivalent to (more or less) the following:

result = []
for line in sample_list:
    temp = []
    for x in line.replace('   ', '  ').split(' '): # replace three spaces with just two before splitting
        if x: # If x is not an empty string than we can add it
            temp.append(x)
        else: # else it is None
            temp.append(None)

Upvotes: 1

Related Questions