Reputation: 33
I have a transposed Dataframe tr:
7128 | 8719 | 14051 | 14636 | |
---|---|---|---|---|
JDUTC_0 | 2451957.36 | 2452149.36 | 2457243.98 | 2452531.89 |
JDUTC_1 | 2451957.37 | 2452149.36 | 2457243.99 | 2452531.90 |
JDUTC_2 | 2451957.37 | 2452149.36 | 2457244.00 | 2452531.91 |
JDUTC_3 | NaN | 2452149.36 | NaN | NaN |
JDUTC_4 | NaN | 2452149.36 | NaN | NaN |
JDUTC_5 | NaN | 2452149.36 | NaN | NaN |
JDUTC_6 | 1.23 | 2452149.37 | NaN | NaN |
JDUTC_7 | NaN | NaN | NaN | NaN |
JDUTC_8 | NaN | NaN | NaN | NaN |
JDUTC_9 | NaN | NaN | NaN | NaN |
And I create dict 'a' with this block of code:
a = {}
b=[]
for _, contents in tr.items():
b.clear()
for ind, val in enumerate(contents):
if np.isnan(val):
b.append(ind)
continue
else:
pass
print(_)
print(b)
a[_] = b
print(a)
Which gives me this output:
7128
[3, 4, 5, 7, 8, 9]
{7128: [3, 4, 5, 7, 8, 9]}
8719
[7, 8, 9]
{7128: [7, 8, 9], 8719: [7, 8, 9]}
14051
[3, 4, 5, 6, 7, 8, 9]
{7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9]}
14636
[3, 4, 5, 6, 7, 8, 9]
{7128: [3, 4, 5, 6, 7, 8, 9], 8719: [3, 4, 5, 6, 7, 8, 9], 14051: [3, 4, 5, 6, 7, 8, 9],
14636: [3, 4, 5, 6, 7, 8, 9]}
What I expect dict 'a' to look like is this:
{7128: [3, 4, 5, 7, 8, 9]
8719: [7, 8, 9]
14051: [3, 4, 5, 6, 7, 8, 9]
14636: [3, 4, 5, 6, 7, 8, 9]}
What I am doing wrong? Why is a[_] = b
overwriting all the previous keys when print(_)
is verifying that _ is always the next column label?
Upvotes: 0
Views: 93
Reputation: 9865
With the correct name convention, I would change your code after:
import numpy as np
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
s = StringIO("""idx 7128 8719 14051 14636
JDUTC_0 2451957.36 2452149.36 2457243.98 2452531.89
JDUTC_1 2451957.37 2452149.36 2457243.99 2452531.90
JDUTC_2 2451957.37 2452149.36 2457244.00 2452531.91
JDUTC_3 NaN 2452149.36 NaN NaN
JDUTC_4 NaN 2452149.36 NaN NaN
JDUTC_5 NaN 2452149.36 NaN NaN
JDUTC_6 1.23 2452149.37 NaN NaN
JDUTC_7 NaN NaN NaN NaN
JDUTC_8 NaN NaN NaN NaN
JDUTC_9 NaN NaN NaN NaN""")
tr = pd.read_csv(s, sep="\t", index_col=0)
(people should give minimal working code - but often forget to give e.g. the code to build the data frame etc. and the imports)
to:
a = {}
b = []
for name, values in tr.items():
b.clear() # this is problematic as you know
for ind, val in enumerate(values):
if np.isnan(val):
b.append(ind)
continue
else:
pass
a[name] = b
continue
and pass
are not necessary - they just say "go on" with the loop.
In Python, you are not forced to give the else
branch:
for name, values in tr.items():
b.clear() # This is still problematic at this state.
for ind, val in enumerate(values):
if np.isnan(val):
b.append(ind)
a[name] = b
Such collection of data using for-loops are better done with list-comprehensions:
a = {}
for name, values in tr.items():
b = [ind for ind, val in enumerate(values) if np.isnan(val)]
a[name] = b
# now the result is already correct!
And finally, you can even build list-comprehensions for dictionaries - making this entire code a one-liner - but a readable one - when one is familiar with list comprehensions:
a = {name: [i for i, x in enumerate(vals) if np.isnan(x)] for name, vals in tr.items()}
You can see the result:
a
# which returns:
{'7128': [3, 4, 5, 7, 8, 9],
'8719': [7, 8, 9],
'14051': [3, 4, 5, 6, 7, 8, 9],
'14636': [3, 4, 5, 6, 7, 8, 9]}
List-comprehensions are going into the direction of Functional Programming (FP).
Which exactly deals with the problem of not to apply mutation (like the b.append()
or b.clear()
methods - because - as you have seen: your case is a demonstration of how easily a bug is generated when using mutation. - and would contribute to the discussion - why FP - while it at the first sight looks brain-unfriendly - is
actually the more brain-friendly way to program.
List comprehensions are the Pythonic form of "map" - and if you use a "if" inside list comprehensions - this is the Pythonic equivalent to "filter" which FP people know like a second brain for breathing.
Upvotes: 1
Reputation: 2543
The problem is you are assigning same list to all keys.
a = {}
b=[] # < --- You create one Array/list 'b'
for _, contents in tr.items():
b.clear()
for ind, val in enumerate(contents):
if np.isnan(val):
b.append(ind)
continue
else:
pass
print(_)
print(b)
a[_] = b # <-- assign same array to all keys.
print(a)
Check my comment on the code above.
b.clear()
This line just clears the same array, it does not create a new array.
To run the code as you intended, create a new array/list in side the loop.
a = {}
for _, contents in tr.items():
b = [] # <--- new array/list is created
for ind, val in enumerate(contents):
if np.isnan(val):
b.append(ind)
continue
else:
pass
print(_)
print(b)
a[_] = b # <--- Now you assign the new array 'b' to a[_]
print(a)
Upvotes: 1