Reputation: 16478
I have two questions. First, my filling up the data in the end triggers the following error. Second, since I am not too familiar with ``pandas'', this code is probably really untypical. If you have any improvements, feel free to help make this compact and efficient.
The code is supposed to create a crosswalk between x to y. The database may contain the same x<->y relationship several time. However, it should be unique. For every X, I check if the database is actually correct: if there is more than one relation, they all match to the same y.
Beginning of the crosswalk.csv:
x,y
832,"6231"
0,"00000000"
0,"00000000"
0,"00000000"
0,"00000000"
0,"00000000"
0,"00000000"
840,"6214"
842,"6111"
The code
data = pd.read_csv('data/crosswalk_short.csv')
df = pd.DataFrame(data)
xs = df.x.unique()
result = pd.DataFrame(index=xs)
result.fillna(NaN)
for x in xs:
ys = df[df.x == x].y
range = arange(0, len(ys.index))
ys = ys.reindex(range)
if (range[-1] > 0 and not isnan(ys[1]) ):
print 'error!'
result._ix[x] = ys[0]
The error:
File "<ipython-input-129-4cf0c04508c4>", line 1, in <module>
result._ix[x] = ys[0]
TypeError: 'NoneType' object does not support item assignment
Upvotes: 0
Views: 1144
Reputation: 25662
Anything with a single underscore as the first character of a name is generally "private" which in pandas code base really means "subject to change". So, you shouldn't be using _ix
for anything. Use loc
, iloc
, []
syntax, or ix
to perform assignment and to select subsets of your data. This error happens because _ix
is not instantiated until you call ix
(and its value is None
until that happens), but this implementation detail is completely irrelevant to you as a user of pandas. Use the public APIs and you usually won't get these kinds of errors.
Also, this line
result.fillna(NaN)
is a no-op because by default fillna
returns a copy. If you to update result
in place, do
result.fillna(NaN, inplace=True)
This API convention is fairly consistent throughout pandas. That is, for methods where it makes sense to do so, the function signatures have something like
object.method(..., inplace=False)
by default.
As for your second question, it looks like you want to check whether all duplicate xs
have the same y
value. One way to do that is:
df.groupby('x').filter(lambda x: x.count() > 1).groupby('x').y.nunique() == 1
This says:
'x'
column'x'
)'x'
column'y'
for each value in 'x'
If 4. is False
for any of the groups, that means you have x values repeated, where the y values are different.
Here's an example of this in action (I've modified your original dataset a little bit):
In [94]: df = pd.read_csv(StringIO('''x,y
q832,"6231"
1,"00000000"
1,"00000001"
0,"00000000"
0,"00000000"
0,"00000000"
0,"00000000"
840,"6214"
840,"6111"'''))
In [95]: df.groupby('x').filter(lambda x: x.count() > 1).groupby('x').y.nunique() == 1
Out[95]:
x
0 True
1 False
840 False
dtype: bool
Upvotes: 3