Woody Pride
Woody Pride

Reputation: 13955

Label based indexing Pandas (.loc)

I have recently been made aware of the dangers of chained assignment, and I am trying to use the proper method of indexing in pandas using loc[rowindex, colindex]. I am working with mixed data types (mix within the same series of np.float64 and list and string) - this is unavoidable. I have an integer index

I am running the following loop through a data frame

Count = 0
for row in DF.index:
print row
    if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in    str(DF.buyer[row])\
    and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
        DF.loc[row, 'order_no'] = str(DF.loc[row, 'order_no']).split('/')
        Count +=1

Count

Which returns the error:

 TypeError: object of type 'int' has no len()

What am I doing wrong?

Within that loop I can do:

print DF.loc[row, 'order_no']

and

print DF.loc[row, 'order_no'] == str(DF.loc[row, order_no]).split('/')

but not

DF.loc[row, 'order_no'] = str(DF.loc[row, order_no]).split('/')

Using the print statement I see that it gets stuck on row 3, yet:

DF.loc[3, 'order_no']

works just fine.

Help apprecitated.

EDIT

A workaround is the following:

Count = 0
Vals = []
Ind = []
for row in DF.index:
    if '/' in str(DF.order_no[row]) and '/' not in str(DF.buyer[row]) and '/' not in str(DF.buyer[row])\
    and '/' not in str(DF.smv[row]) and '/' not in str(DF.item[row]):
        Vals.append(DF.order_no[row].split('/'))
        Ind.append(row)
        Count +=1

DF.loc[Ind, 'order_no'] = Vals    

In other words I can create a list of the values to be modified and then change them using .loc. This works fine which leads me to believge that the issue is not with the values I am tryng to assign, and with the assignment process itself.

Here is an example of the type of data I am working on: The code fails on row 3 and 9 as far as i can tell. Sorry its in csv format, but this is how I am reading it into pandas.

https://www.dropbox.com/s/zuy8pj15nlhmcfb/EG2.csv

Using that data if the following is done:

EG = pd.reas_csv('EG.csv')
EG.loc[3, 'order_no'] = str(EG.loc[3, 'order_no']).split('/')

Fails with the error

object of type 'int' has no len()

But

EG['order_no'][3] = str(EG.loc[3, 'order_no']).split('/')

works fine, but this is the type of chain assignment I am trying to avoid as it was giving me issues elsewhere.

which is why I thought this was just a syntax error.

Sorry for this now unweildy question

Upvotes: 2

Views: 16440

Answers (2)

BrenBarn
BrenBarn

Reputation: 251355

You may be running into dtype issues. The following code works for me:

import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data, dtype=object)

And then:

>>> df.loc[3, 'order_no'] = [1, 2]
>>> df
  order_no working_hr
3   [1, 2]          9

Note the dtype=object. This may be why your errors disappeared when you shortened the DataFrame, especially if you're reading from csv. In many situations (such as readng from CSV), pandas tries to infer the dtype and pick the most specific one. You can assign a list as a value if the dtype is object, but not if it's (for instance) float64. So check whether your mixed-type column really is set to dtype object.

The same works with your provided CSV:

>>> df = pandas.read_clipboard(sep='\t', index_col=0)
>>> df
        buyer          order_no                                 item         smv
0         H&M            992754                        Cole tank top        6.17
1         H&M            859901                         Thilo Bottom        8.55
2         H&M            731231               Palma Short Sleeve Tee        5.65
3         H&M     731231/339260                      Palma Price Tee        5.65
4         H&M     859901/304141  Thilo Paijama Set top/Elva Tank Top   5.80/5.58
5         H&M            768380                       Folke Tank Top           6
6         H&M     596701/590691                        Paul Rock Tee        7.65
7    H&M/Mexx  731231/KIEZ-P002        Palma Short Sleeve Tee/Shorts  5.65/12.85
8         NaN               NaN                                  NaN         NaN
9  Ginatricot     512008/512009                           J.Tank top         4.6
>>> df.loc[3, 'order_no'] = str(df.loc[3, 'order_no']).split('/')
>>> df
        buyer          order_no                                 item         smv
0         H&M            992754                        Cole tank top        6.17
1         H&M            859901                         Thilo Bottom        8.55
2         H&M            731231               Palma Short Sleeve Tee        5.65
3         H&M  [731231, 339260]                      Palma Price Tee        5.65
4         H&M     859901/304141  Thilo Paijama Set top/Elva Tank Top   5.80/5.58
5         H&M            768380                       Folke Tank Top           6
6         H&M     596701/590691                        Paul Rock Tee        7.65
7    H&M/Mexx  731231/KIEZ-P002        Palma Short Sleeve Tee/Shorts  5.65/12.85
8         NaN               NaN                                  NaN         NaN
9  Ginatricot     512008/512009                           J.Tank top         4.6

Upvotes: 5

alko
alko

Reputation: 48307

Shorter error raising code for reference (until OP includes it in his question):

import pandas as pd
data = {'working_hr': {3: 9.0}, 'order_no': {3: 731231}}
df = pd.DataFrame.from_dict(data)
df.loc[3, 'order_no'] = [1,2] # raises error

Inspecting code, list value [1,2] is treated by _setitem_with_indexer as list, and I can't see how can this issue be avoided for the value treated as scalar.

Upvotes: 0

Related Questions