Reputation: 111
The series which I am handing now looks like this:
qa_answers['date_of_birth']
1 []
2 []
...
2600 [1988/11/23]
2601 [1992/7/15]
2602 [1993/11/8"]
2603 [1997/08/31]
2604 [1971/2/11]
2605 [1979/11/1"]
2606 [1993/9/19]
2607 [1985/01/12]
2608 [1977/11/3"]
2609 [1981/7/2"]
2610 [1952/4/9"]
2611 [1991/8/20]
2612 [1993/1/31]
Name: date_of_birth, dtype: object
This problem might consist of two parts:
qa_answers['date_of_birth'] = pd.to_datetime(qa_answers['date_of_birth'],errors='coerce')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-147-96dff0351764> in <module>()
28 qa_answers['date_of_birth2']= qa_answers['answers'].str.findall(dob2)
29 qa_answers['date_of_birth'] = qa_answers['date_of_birth1'] + qa_answers['date_of_birth2']
---> 30 qa_answers['date_of_birth'] = pd.to_datetime(qa_answers['date_of_birth'],errors='coerce')
31
32
4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in unique(values)
403
404 table = htable(len(values))
--> 405 uniques = table.unique(values)
406 uniques = _reconstruct_data(uniques, dtype, original)
407 return uniques
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.unique()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'list'
So I guess I should try to extract the element out of the list first. How can I do this job?
p.s. Also, could you give some tips for removing ' " ' in the element?
Upvotes: 0
Views: 144
Reputation: 148965
You must first convert non empty lists to their first element and clean it and convert empty list to an empty string:
df.date_of_birth.apply(lambda x: x[0].replace('"', '') if len(x) > 0 else '')
gives:
1
2
...
2600 1988/11/23
2601 1992/7/15
2602 1993/11/8
2603 1997/08/31
2604 1971/2/11
2605 1979/11/1
2606 1993/9/19
2607 1985/01/12
2608 1977/11/3
2609 1981/7/2
2610 1952/4/9
2611 1991/8/20
2612 1993/1/31
Then you can easily convert that to a datetime column:
pd.to_datetime(df.date_of_birth.apply(lambda x: x[0].replace('"', '') if len(x) > 0 else ''))
you get:
1 NaT
2 NaT
2600 1988-11-23
2601 1992-07-15
2602 1993-11-08
2603 1997-08-31
2604 1971-02-11
2605 1979-11-01
2606 1993-09-19
2607 1985-01-12
2608 1977-11-03
2609 1981-07-02
2610 1952-04-09
2611 1991-08-20
2612 1993-01-31
Name: date_of_birth, dtype: datetime64[ns]
Upvotes: 2
Reputation: 36624
This would do it:
pd.to_datetime(<your series>.str[1:-1].str.replace('"', ''))
Just put the right column names. I did it by copying your example:
df = pd.read_clipboard(index_col=0).iloc[:, 0]
pd.to_datetime(df.str[1:-1].str.replace('"', ''))
Upvotes: 1