Reputation: 13955
I have recently been made aware of the dangers of chained assignment, and I am trying to use the proper method of indexing in pandas using loc[rowindex, colindex]. I am working with mixed data types (mix within the same series of np.float64 and list and string) - this is unavoidable. I have an integer index
I am running the following loop through a data frame
Count = 0
for row in DF.index:
print row
if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
DF.loc[row, 'order_no'] = str(DF.loc[row, 'order_no']).split('/')
Count +=1
Count
Which returns the error:
TypeError: object of type 'int' has no len()
What am I doing wrong?
Within that loop I can do:
print DF.loc[row, 'order_no']
and
print DF.loc[row, 'order_no'] == str(DF.loc[row, order_no]).split('/')
but not
DF.loc[row, 'order_no'] = str(DF.loc[row, order_no]).split('/')
Using the print statement I see that it gets stuck on row 3, yet:
DF.loc[3, 'order_no']
works just fine.
Help apprecitated.
EDIT
A workaround is the following:
Count = 0
Vals = []
Ind = []
for row in DF.index:
if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
Vals.append(DF.order_no[row].split('/'))
Ind.append(row)
Count +=1
DF.loc[Ind, 'order_no'] = Vals
In other words I can create a list of the values to be modified and then change them using .loc. This works fine which leads me to believge that the issue is not with the values I am tryng to assign, and with the assignment process itself.
Here is an example of the type of data I am working on: The code fails on row 3 and 9 as far as i can tell. Sorry its in csv format, but this is how I am reading it into pandas.
https://www.dropbox.com/s/zuy8pj15nlhmcfb/EG2.csv
Using that data if the following is done:
EG = pd.reas_csv('EG.csv')
EG.loc[3, 'order_no'] = str(EG.loc[3, 'order_no']).split('/')
Fails with the error
object of type 'int' has no len()
But
EG['order_no'][3] = str(EG.loc[3, 'order_no']).split('/')
works fine, but this is the type of chain assignment I am trying to avoid as it was giving me issues elsewhere.
which is why I thought this was just a syntax error.
Sorry for this now unweildy question
Upvotes: 2
Views: 16440
Reputation: 251355
You may be running into dtype issues. The following code works for me:
import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data, dtype=object)
And then:
>>> df.loc[3, 'order_no'] = [1, 2]
>>> df
order_no working_hr
3 [1, 2] 9
Note the dtype=object
. This may be why your errors disappeared when you shortened the DataFrame, especially if you're reading from csv. In many situations (such as readng from CSV), pandas tries to infer the dtype and pick the most specific one. You can assign a list as a value if the dtype is object, but not if it's (for instance) float64. So check whether your mixed-type column really is set to dtype object
.
The same works with your provided CSV:
>>> df = pandas.read_clipboard(sep='\t', index_col=0)
>>> df
buyer order_no item smv
0 H&M 992754 Cole tank top 6.17
1 H&M 859901 Thilo Bottom 8.55
2 H&M 731231 Palma Short Sleeve Tee 5.65
3 H&M 731231/339260 Palma Price Tee 5.65
4 H&M 859901/304141 Thilo Paijama Set top/Elva Tank Top 5.80/5.58
5 H&M 768380 Folke Tank Top 6
6 H&M 596701/590691 Paul Rock Tee 7.65
7 H&M/Mexx 731231/KIEZ-P002 Palma Short Sleeve Tee/Shorts 5.65/12.85
8 NaN NaN NaN NaN
9 Ginatricot 512008/512009 J.Tank top 4.6
>>> df.loc[3, 'order_no'] = str(df.loc[3, 'order_no']).split('/')
>>> df
buyer order_no item smv
0 H&M 992754 Cole tank top 6.17
1 H&M 859901 Thilo Bottom 8.55
2 H&M 731231 Palma Short Sleeve Tee 5.65
3 H&M [731231, 339260] Palma Price Tee 5.65
4 H&M 859901/304141 Thilo Paijama Set top/Elva Tank Top 5.80/5.58
5 H&M 768380 Folke Tank Top 6
6 H&M 596701/590691 Paul Rock Tee 7.65
7 H&M/Mexx 731231/KIEZ-P002 Palma Short Sleeve Tee/Shorts 5.65/12.85
8 NaN NaN NaN NaN
9 Ginatricot 512008/512009 J.Tank top 4.6
Upvotes: 5
Reputation: 48307
Shorter error raising code for reference (until OP includes it in his question):
import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data)
df.loc[3, 'order_no'] = [1,2] # raises error
Inspecting code, list value [1,2]
is treated by _setitem_with_indexer as list, and I can't see how can this issue be avoided for the value treated as scalar.
Upvotes: 0