Anitha
Anitha

Reputation: 25

Is there a difference between series.tolist() and list in pandas

I defined a function z. It works when I pass a list, but when I pass a series(even after converting to list) it returns wrong answers. My input argument to the function z has to be a series. How to resolve this?

list1 = [np.nan, 14975, 98121]
series1 = pd.Series([np.nan,14975,98121])

z(series1.tolist())
['0', '0', '0']

z(list1)
['0', '1', '98121']

My z function is,

def z(each):
    zipcode_list = []
    for i in each:   
        try:
            if zipcodes.is_real(str(i)):
                zip_code = str(i)      
            else:
                zip_code = str(1)
        except:
            zip_code = str(0)   
        zipcode_list.append(zip_code)
    return zipcodes

Upvotes: 2

Views: 645

Answers (1)

OrionTheHunter
OrionTheHunter

Reputation: 276

Although pandas does correctly return a list (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.tolist.html) they take the safe route and turn the values in the Series to floats and not ints. Thus when you try to get the zipcode of a float, it errors.

You can see this by running the following:

import pandas as pd
import numpy as np
import zipcodes
list1 = [np.nan, 14975, 98121]
series1 = pd.Series([np.nan,14975,98121])


def z(each):
    zipcode_list = []
    for i in each:
        print(i, type(i))
        try:
            if zipcodes.is_real(str(i)):
                zip_code = str(i)
            else:
                zip_code = str(1)
        except Exception:
            zip_code = str(0)
        zipcode_list.append(zip_code)
    return zipcode_list


print(z(series1.tolist()))

print(z(list1))

Output:

nan <class 'float'>
14975.0 <class 'float'>
import pandas as pd
98121.0 <class 'float'>
['0', '0', '0']
nan <class 'float'>
14975 <class 'int'>
98121 <class 'int'>
['0', '1', '98121']

Changing the code to convert the list to ints before passing it into z will fix your problem. See:

import pandas as pd
import numpy as np
import zipcodes
list1 = [np.nan, 14975, 98121]
series1 = pd.Series([np.nan,14975,98121])


def z(each):
    zipcode_list = []
    for i in each:
        try:
            if zipcodes.is_real(str(int(i))):
                zip_code = str(int(i))
            else:
                zip_code = str(1)
        except Exception:
            zip_code = str(0)
        zipcode_list.append(zip_code)
    return zipcode_list


print(z(series1.tolist()))
# ['0', '1', '98121']

print(z(list1))
# ['0', '1', '98121']

Upvotes: 1

Related Questions