piseynir
piseynir

Reputation: 225

changing column data type using pandas

I have a dataframe with date column. Column includes "custom" and "general" datatypes. I want to change it all datetime format. "43891" means "01.03.2020 00:00:00"

TARİH
28.02.2020  00:00:00 -->custom
28.02.2020  00:00:00 -->custom
28.02.2020  00:00:00 -->custom
43891 -->general
43891 -->general
43891 -->general
.
.

Here what I have tried below, same problem with me (ref. changing all dates to standard date time in dataframe)

import pandas as pd
from datetime import datetime, timedelta

def from_excel_ordinal(ordinal, _epoch0=datetime(1899, 12, 31)):
    if ordinal >= 60:
        ordinal -= 1  # Excel leap year bug, 1900 is not a leap year!
    return (_epoch0 + timedelta(days=ordinal)).replace(microsecond=0)

df = pd.read_excel('D:\Documents\Desktop\deneme/deneme1.xlsx', sheet_name='Sheet1')
m = df['TARİH'].astype(str).str.isdigit()

df.loc[m, 'TARİH'] = \
df.loc[m, 'TARİH']\
  .astype(int)\
  .apply(from_excel_ordinal)

df['TARİH'] = pd.to_datetime(df['TARİH'], errors='coerce')

df.to_excel('D:\Documents\Desktop\deneme/deneme1.xlsx',index=False)

When I apply these codes, I am sharing output below. "General type" cells turns to "NaT".

print(df.loc[3280:3286, 'TARİH'])

Output: 
2020-02-28
2020-02-28
2020-02-28
2020-02-28
NaT
NaT
NaT
Name: TARİH, dtype: datetime64[ns]

In this solution, changing all dates to standard date time in dataframe all column is "general" datatype. Due to that problem is solved. But when I apply above codes to my dataframe, Column D format is turning to "datetime" format. Due to that I am taking below error when I run the codes 2nd time:

TypeError: cannot astype a datetimelike from [datetime64[ns]] to [int32]

I will use these codes everyday. Due to that I need solve the format cell problem. I can try also other methods if you offer.

Also I have 3000 rows. So I'm not able to apply manual methods.

Upvotes: 2

Views: 555

Answers (2)

piseynir
piseynir

Reputation: 225

There are no answers came to question with pandas method. Due to that I used "pynput.mouse" library.

When you change the column style to "short date" with mouse controller method, df['TARİH'] = pd.to_datetime(df['TARİH']) this code runs due to there are no mixed datetimes and integers passed array, whole column have same format.

If you have pandas or any other method, please answer.

from pynput.mouse import Button, Controller
import pandas as pd

#Go to desktop
mouse= Controller ()
mouse.move(1358,751)
mouse.click(Button.left, 1)

#Open folder
mouse.position=(632, 108)
time.sleep(2)
mouse.click(Button.left,2)

#Open excel file
mouse.position=(354, 127)
time.sleep(2)
mouse.click(Button.left,2)

#Select D column in excel
mouse.position=(250, 256)
time.sleep(10)
mouse.click(Button.left,1)

#Go to format cell area
mouse.position=(709, 87)
time.sleep(2)
mouse.click(Button.left,1)

#Change format to short date
mouse.position=(663, 297)
time.sleep(2)
mouse.click(Button.left,1)

#Close excel file
mouse.position=(1337, 11)
time.sleep(2)
mouse.click(Button.left,1)

#Save excel file
mouse.position=(597, 400)
time.sleep(2)
mouse.click(Button.left,1)

#wait till excel close
time.sleep(3)

print("Formula writing operation is starting..")
df = pd.read_excel('D:\Documents\Desktop\deneme/2020 Data_çalışma.xlsx', sheet_name='Sheet1')
df['TARİH'] = pd.to_datetime(df['TARİH'])
print("Formula is written..")


Output:
TARİH
28.02.2020  00:00:00
28.02.2020  00:00:00
28.02.2020  00:00:00
01.03.2020  00:00:00
01.03.2020  00:00:00
01.03.2020  00:00:00
.
.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150745

IIUC, 43891 is the number of days since a zero-date:

# zero_date = 1899-12-29
zero_date = pd.to_datetime('2020-03-01') - pd.to_timedelta(43891, unit='D')

And then you can do np.select:

# you need dayfist
custom = pd.to_datetime(df['TARİH'], dayfirst=True, errors='coerce')

# general type
df['TARİH'] = np.where(custom.isna(), df['TARİH'],
                       (custom - zero_date)/pd.to_timedelta('1D')
                      )

Upvotes: 1

Related Questions