Stevo
Stevo

Reputation: 85

Whats the difference in two lists?

I have a list that i used regex on to remove the spaces in strings in the list which works perfectly -

newrooms = re.sub(r'\s+', " ", str(newrooms)) 

the original list looks like -

[['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)'], ['8      10-11am', 'MR252 (30)'], ['9      11-12pm', 'MR252 (30)'], ['10      10-11am', 'MR252 (30)'], ['10      11-12pm', 'MR251 (22)'], ['12      10-11am', 'MR107 (63)'], ['12      11-12pm', 'MR252 (30)'], ['17      10-11am', 'MR252 (30)'], ['18      11-12pm', 'MR252 (30)'], ['19      10-11am', 'MR252 (30)'], ['19      11-12pm', 'MR265 (24)'], ['20      10-11am', 'CB203 (26)'], ['20      11-12pm', 'MR252 (30)'], ['27      10-11am', 'MR252 (30)'], ['28      11-12pm', 'MR252 (30)'], ['29      10-11am', 'MR252 (30)'], ['42      11-12pm', 'MR252 (30)'], ['42       2-4pm                MA ONLY', 'MR252 (30)'], ['43      10-11am', 'MR252 (30)'], ['44      10-11am', ''], ['44      11-12pm', 'MR252 (30)']]

print newrooms[3] prints ... "['9 11-12pm', 'MR252 (30)']"

after using the re.sub to remove the spaces the list looks like

[['4 11-12pm', 'MR252 (30)'], ['5 10.30-12pm', 'MR252 (30)'], ['8 10-11am', 'MR252 (30)'], ['9 11-12pm', 'MR252 (30)'], ['10 10-11am', 'MR252 (30)'], ['10 11-12pm', 'MR251 (22)'], ['12 10-11am', 'MR107 (63)'], ['12 11-12pm', 'MR252 (30)'], ['17 10-11am', 'MR252 (30)'], ['18 11-12pm', 'MR252 (30)'], ['19 10-11am', 'MR252 (30)'], ['19 11-12pm', 'MR265 (24)'], ['20 10-11am', 'CB203 (26)'], ['20 11-12pm', 'MR252 (30)'], ['27 10-11am', 'MR252 (30)'], ['28 11-12pm', 'MR252 (30)'], ['29 10-11am', 'MR252 (30)'], ['42 11-12pm', 'MR252 (30)'], ['42 2-4pm MA ONLY', 'MR252 (30)'], ['43 10-11am', 'MR252 (30)'], ['44 10-11am', ''], ['44 11-12pm', 'MR252 (30)']]

its just the same (minus the spaces) but now =

print newrooms[3] prints ... "4"

all the code here =

print newrooms[3]
print newrooms
newrooms = re.sub(r'\s+', " ", str(newrooms))
print newrooms[3]
print newrooms

Why does the list now not act like a list ?

OK guys, I see, I was converting the whole list to a string with str(newrooms), what i should be doing is ..

 print newrooms[3]
    print newrooms
    for obj in newrooms:
        obj[0] = re.sub(r'\s+', " ", (obj[0]))
    print newrooms[3]
    print newrooms

Upvotes: 2

Views: 96

Answers (5)

Padraic Cunningham
Padraic Cunningham

Reputation: 180391

You can use str.join and str.split operating on each string in each sublist not convert the list to a string:

l = [['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)'], ['8      10-11am', 'MR252 (30)'], ['9      11-12pm', 'MR252 (30)'], ['10      10-11am', 'MR252 (30)'], ['10      11-12pm', 'MR251 (22)'], ['12      10-11am', 'MR107 (63)'], ['12      11-12pm', 'MR252 (30)'], ['17      10-11am', 'MR252 (30)'], ['18      11-12pm', 'MR252 (30)'], ['19      10-11am', 'MR252 (30)'], ['19      11-12pm', 'MR265 (24)'], ['20      10-11am', 'CB203 (26)'], ['20      11-12pm', 'MR252 (30)'], ['27      10-11am', 'MR252 (30)'], ['28      11-12pm', 'MR252 (30)'], ['29      10-11am', 'MR252 (30)'], ['42      11-12pm', 'MR252 (30)'], ['42       2-4pm                MA ONLY', 'MR252 (30)'], ['43      10-11am', 'MR252 (30)'], ['44      10-11am', ''], ['44      11-12pm', 'MR252 (30)']]

l[:] = [[" ".join(s.split()) for s in sub] for sub in l]

from pprint import  pprint as pp

Output will be a list:

[['4 11-12pm', 'MR252 (30)'],
 ['5 10.30-12pm', 'MR252 (30)'],
 ['8 10-11am', 'MR252 (30)'],
 ['9 11-12pm', 'MR252 (30)'],
 ['10 10-11am', 'MR252 (30)'],
 ['10 11-12pm', 'MR251 (22)'],
 ['12 10-11am', 'MR107 (63)'],
 ['12 11-12pm', 'MR252 (30)'],
 ['17 10-11am', 'MR252 (30)'],
 ['18 11-12pm', 'MR252 (30)'],
 ['19 10-11am', 'MR252 (30)'],
 ['19 11-12pm', 'MR265 (24)'],
 ['20 10-11am', 'CB203 (26)'],
 ['20 11-12pm', 'MR252 (30)'],
 ['27 10-11am', 'MR252 (30)'],
 ['28 11-12pm', 'MR252 (30)'],
 ['29 10-11am', 'MR252 (30)'],
 ['42 11-12pm', 'MR252 (30)'],
 ['42 2-4pm MA ONLY', 'MR252 (30)'],
 ['43 10-11am', 'MR252 (30)'],
 ['44 10-11am', ''],
 ['44 11-12pm', 'MR252 (30)']]

Upvotes: 1

MaxNoe
MaxNoe

Reputation: 14997

You convert the list newrooms to a single string in this line:

newrooms = re.sub(r'\s+', " ", str(newrooms))

So it is just one string and not a list anymore. What you want to do is to apply the substitution on the single elements of the list:

newrooms = [
    [re.sub(r'\s+', " ", elem) for elem in sublist]
    for sublist in newrooms
]

This results in:

>>> newrooms[3]
['9 11-12pm', 'MR252 (30)']

Upvotes: 1

timgeb
timgeb

Reputation: 78680

What you want is to replace sequences of repeated whitespace with a single blank for each string in a lists of lists.

What you actually do is convert the list to a string and then do the substituting operation.

Here's what happens - I will use a shortened version of your original list for readability:

>>> import re
>>> newrooms = [['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)']]
>>> newrooms_str = str(newrooms)
>>> newrooms_str
"[['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)']]"
>>> newrooms_str = re.sub(r'\s+', " ", newrooms_str)
>>> newrooms_str
"[['4 11-12pm', 'MR252 (30)'], ['5 10.30-12pm', 'MR252 (30)']]"
>>> newrooms_str[3]
'4'

As you can see, you are passing a string to re.sub, which returns a string. The fourth character of that string is the character '4' which you see when you do newrooms_str[3].

In order to get your desired result, you need to operate on the individual strings in your list of lists:

>>> newrooms
[['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)']]
>>> newrooms = [[re.sub(r'\s+', " ", string) for string in sublist] for sublist in newrooms]
>>> newrooms
[['4 11-12pm', 'MR252 (30)'], ['5 10.30-12pm', 'MR252 (30)']]
>>> newrooms[1]
['5 10.30-12pm', 'MR252 (30)']

Upvotes: 1

Vader
Vader

Reputation: 3873

It returns unexpected result, because you convert list to the string before replacing. Try this instead:

import re
newrooms = [['4      11-12pm', 'MR252 (30)'], ['5      10.30-12pm', 'MR252 (30)'], ['8      10-11am', 'MR252 (30)'], ['9      11-12pm', 'MR252 (30)'], ['10      10-11am', 'MR252 (30)'], ['10      11-12pm', 'MR251 (22)'], ['12      10-11am', 'MR107 (63)'], ['12      11-12pm', 'MR252 (30)'], ['17      10-11am', 'MR252 (30)'], ['18      11-12pm', 'MR252 (30)'], ['19      10-11am', 'MR252 (30)'], ['19      11-12pm', 'MR265 (24)'], ['20      10-11am', 'CB203 (26)'], ['20      11-12pm', 'MR252 (30)'], ['27      10-11am', 'MR252 (30)'], ['28      11-12pm', 'MR252 (30)'], ['29      10-11am', 'MR252 (30)'], ['42      11-12pm', 'MR252 (30)'], ['42       2-4pm                MA ONLY', 'MR252 (30)'], ['43      10-11am', 'MR252 (30)'], ['44      10-11am', ''], ['44      11-12pm', 'MR252 (30)']]

newrooms = [[re.sub(r'\s+', " ", room) for room in rooms] for rooms in newrooms]
print newrooms[3]

Upvotes: 0

flaschbier
flaschbier

Reputation: 4177

After

newrooms = re.sub(r'\s+', " ", str(newrooms))

newrooms, formerly a list(), becomes a string.

print newrooms[3]

prints the 4th character in that string. Python is ducktyping variables, so each variable flexibly adapts to what you store in it.

Upvotes: 4

Related Questions