maynard
maynard

Reputation: 77

Problems transforming data in a dataframe

I've written the function (tested and working) below:

import pandas as pd

def ConvertStrDateToWeekId(strDate):
    dateformat = '2016-7-15 22:44:09'
    aDate = pd.to_datetime(strDate)
    wk = aDate.isocalendar()[1]
    yr = aDate.isocalendar()[0]
    Format_4_5_4_date = str(yr) + str(wk)
    return Format_4_5_4_date'

and from what I have seen on line I should be able to use it this way:

ml_poLines = result.value.select('PURCHASEORDERNUMBER', 'ITEMNUMBER', PRODUCTCOLORID', 'RECEIVINGWAREHOUSEID', ConvertStrDateToWeekId('CONFIRMEDDELIVERYDATE'))

However when I "show" my dataframe the "CONFIRMEDDELIVERYDATE" column is the original datetime string! NO errors are given.

I've also tried this:

ml_poLines['WeekId'] = (ConvertStrDateToWeekId(ml_poLines['CONFIRMEDDELIVERYDATE']))

and get the following error:

"ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions." which makes no sense to me.

I've also tried this with no success.

 x = ml_poLines.toPandas();
 x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])
 ml_poLines2 = spark.createDataFrame(x)
 ml_poLines2.show()

The above generates the following error:

AttributeError: 'Series' object has no attribute 'isocalendar'

What have I done wrong?

Upvotes: 0

Views: 181

Answers (2)

maynard
maynard

Reputation: 77

This was the work-around that I got to work:

`# convert the confirimedDeliveryDate to a WeekId
 x= ml_poLines.toPandas();
 x['WeekId'] = x[['ITEMNUMBER', 'CONFIRMEDDELIVERYDATE']].apply(lambda y:ConvertStrDateToWeekId(y[1]), axis=1)
 ml_poLines = spark.createDataFrame(x)
 ml_poLines.show()`

Not quite as clean as I would like. Maybe someone else cam propose a cleaner solution.

Upvotes: 0

jschnoor
jschnoor

Reputation: 41

Your function ConvertStrDateToWeekId takes a string. But in the following line the argument of the function call is a series of strings:

x['testDates'] = ConvertStrDateToWeekId(x['CONFIRMEDDELIVERYDATE'])

A possible workaround for this error is to use the apply-function of pandas:

x['testDates'] = x['CONFIRMEDDELIVERYDATE'].apply(ConvertStrDateToWeekId)

But without more information about the kind of data you are processing it is hard to provide further help.

Upvotes: 1

Related Questions