Reputation: 43
I am working on an excel file containing below columns:-
df1
Number Dateopened Status Date closed Work
1 2019-10-30 10.10 Open 2019-10-30 12.10 walking
2 2019-9-30 11.10 Open 2019-9-31 12.10 cycling
3 2019-5-30 10.10 Closed 2019-5-30 12.10 nothing
4 2019-4-30 10.10 Open 2019-4-30 12.10 walking
I can change the date format using the below code:-
df1['Dateopened'] = pd.to_datetime(df1['Dateopened']).dt.strftime("%b %d,%Y - %I:%M%p")
df1['Date closed'] = pd.to_datetime(df1['Date closed']).dt.strftime("%b %d,%Y - %I:%M%p")
In my code i need to manually enter the column names.Is there something i can do so that the python code itself search such columns and modify the data format.
Upvotes: 1
Views: 85
Reputation:
Using a simple List Comprehension like this does the trick:
df[[n for n in df.columns]]
n - represents our column names.
So, to use your example we can substitute like this:
pd.to_datetime(df1['Dateopened']).dt.strftime("%b %d,%Y - %I:%M%p")
for
pd.to_datetime(df1[[n for n in df1.columns]]).dt.strftime("%b %d,%Y - %I:%M%p")
You can even do a check for ex. "Dateopened" or else perform some function.
pd.to_datetime(df1[[n if n == "Dateopened" else "do something here..." for n in df1.columns]]).dt.strftime("%b %d,%Y - %I:%M%p")
Upvotes: 0
Reputation: 3503
Try-Except for EAFP style.
If you have lots of date formatting then this is way to go. Will run faster than regex in that case.
from openpyxl import load_workbook
from datetime import datetime
work_book = load_workbook("Z:/github/StackOverFlow/62717532/source.xlsx")
work_sheet = work_book.active
format_ = '%Y-%m-%d'
for index, row in enumerate(work_sheet.rows):
data = [i.value for i in row]
# trying format to each rows
for idx, item in enumerate(data):
try:
row[idx] = datetime.strptime(item, format_)
except (TypeError, ValueError): # if fail, it's not date.
pass
print(data)
source.xlsx
result
['Num', 'Date', 'Stat']
[1, datetime.datetime(2019, 10, 30, 0, 0), 'Open']
[2, datetime.datetime(2019, 10, 31, 0, 0), 'Closed']
[3, datetime.datetime(2019, 11, 1, 0, 0), 'Open']
[4, datetime.datetime(2019, 11, 2, 0, 0), 'Open']
[5, datetime.datetime(2019, 11, 3, 0, 0), 'Closed']
Upvotes: 0
Reputation: 1475
If all your columns containing dates are in a specific format, you can look into using a regex. The solution would look something like this:
toModify = []
for column in df:
if(re.match(pattern, column)):
toModify.append(column)
And then you have all the column names that need to be modified. Iterate over that.
Upvotes: 0
Reputation: 2987
You can also check column names:
for col in df1.columns:
if col.lower().startswith("date"):
df1[col] = pd.to_datetime(df1[col]).dt.strftime("%b %d,%Y - %I:%M%p")
Upvotes: 1
Reputation: 681
You can use regex to detect the columns where there is a date field.
\d*[-]\d*[-]\d*.*
can help you detect the date portion, assuming the columns begin with a date. You can apply the regex on one row of the dataframe and filter the columns where the regex matches.
Upvotes: 0