Yun Tae Hwang
Yun Tae Hwang

Reputation: 1471

how to get just a string from a data frame

I am trying to define a function with two arguments : df (dataframe), and an integer (employerID) as my arguments. this function will return the full name of the employer.

If the given ID does not belong to any employee, I want to return the string "UNKNOWN" / If no middle name is given only return "LAST, FIRST". / If only the middle initial is given the return the full name in the format "LAST, FIRST M." with the middle initial followed by a '.'.

def getFullName(df, int1):
    df = pd.read_excel('/home/data/AdventureWorks/Employees.xls')
    newdf = df[(df['EmployeeID'] == int1)]
    print("'" + newdf['LastName'].item() + "," + " " + newdf['FirstName'].item() + " " + newdf['MiddleName'].item() + "." + "'")

getFullName('df', 110)

I wrote this code but came up with two problems : 1) if I don't put quotation mark around df, it will give me an error message, but I just want to take a data frame as an argument not a string.

2) this code can't deal with someone with out middle name.

I am sorry but I used pd.read_excel to read the excel file which you can not access. I know it will be hard for you to test the codes without the excel file, if someone let me know how to create a random data frame with the column names, I will go ahead and change it. Thank you,

Upvotes: 1

Views: 275

Answers (1)

TheF1rstPancake
TheF1rstPancake

Reputation: 2378

I created some fake data for this:

           EmployeeID FirstName LastName MiddleName
0          0         a        a          a
1          1         b        b          b
2          2         c        c          c
3          3         d        d          d
4          4         e        e          e
5          5         f        f          f
6          6         g        g          g
7          7         h        h          h
8          8         i        i          i
9          9         j        j       None

EmployeeID 9 has no middle name, but everyone else does. The way I would do it is to break up the logic into two parts. The first, for when you cannot find the EmployeeID. The second manages the printing of the employee's name. That second part should also have two sets of logic, one to control if the employee has a middle name, and the other for if they don't. You could likely combine a lot of this into single line statements, but you will likely sacrifice clarity.

I also removed the pd.read_excel call from the function. If you want to pass the dataframe in to the function, then the dataframe should be created oustide of it.

def getFullName(df, int1):
   newdf = df[(df['EmployeeID'] == int1)]

   # if the dataframe is empty, then we can't find the give ID
   # otherwise, go ahead and print out the employee's info
   if(newdf.empty):
       print("UNKNOWN")
       return "UNKNOWN"
   else:
       # all strings will start with the LastName and FirstName
       # we will then add the MiddleName if it's present
       # and then we can end the string with the final '
       s = "'" + newdf['LastName'].item() + ", " +newdf['FirstName'].item()
       if (newdf['MiddleName'].item()):
           s = s + " " + newdf['MiddleName'].item() + "."
       s = s + "'"
       print(s)
       return s

I have the function returning values in case you want to manipulate the string further. But that was just me.

If you run getFullName(df, 1) you should get 'b, b b.'. And for getFullName(df, 9) you should get 'j, j'.

So in full, it would be:

df = pd.read_excel('/home/data/AdventureWorks/Employees.xls')
getFullName(df, 1)  #outputs 'b, b b.'
getFullName(df, 9)  #outputs 'j, j'
getFullName(df, 10) #outputs UNKNOWN

Fake data:

d = {'EmployeeID' : [0,1,2,3,4,5,6,7,8,9],
     'FirstName' : ['a','b','c','d','e','f','g','h','i','j'],
     'LastName' : ['a','b','c','d','e','f','g','h','i','j'],
     'MiddleName' : ['a','b','c','d','e','f','g','h','i',None]}
df = pd.DataFrame(d)

Upvotes: 1

Related Questions