Luka Vlaskalic
Luka Vlaskalic

Reputation: 465

Converting a python string index to a datetime one

I am importing some data that has one column for date and another for the hour. I would like to combine them into a single DateTime column.

I have used

df2 = pd.read_csv(csv_file_path,
              parse_dates=[['Date', 'Hour']],
              infer_datetime_format=True)

which has returned

      Date_Hour    Unnamed: 0      Data
0   2018-01-01 0         3         51.29
1   2018-01-01 1         4         43.59    
2   2018-01-01 2         5         93.60    
3   2018-01-01 3         6         54.78    
4   2018-01-01 4         7         14.35

but the column Date_Hour is not in the DateTime format, and when I try to convert it I get an error

df2.set_index('Date_Hour', inplace=True)
pd.to_datetime(df2.index)

gives

ValueError: ('Unknown string format:', '2018-01-01 0')

I have also tried using

pd.to_datetime(prices.index, format='%y-%m-%d %H')

The data that I have imported looks like this;

 ,Date,Hour,Data
3,2018-01-01,0,51.29
4,2018-01-01,1,43.59
5,2018-01-01,2,93.60
6,2018-01-01,3,54.78
7,2018-01-01,4,14.35
8,2018-01-01,5,18.60

And I would like the output to be

                          Data
 Date_Hour
2018-01-01 00:00         51.29
2018-01-01 01:00         43.59  
2018-01-01 02:00         93.60  
2018-01-01 03:00         54.78  
2018-01-01 04:00         14.35

Upvotes: 1

Views: 89

Answers (3)

Ananay Mital
Ananay Mital

Reputation: 1475

Charles Landau was close. Just some changes

df

      sr          Date  Hour     Data
0      3    2018-01-01     0    51.29
1      4    2018-01-01     1    43.59
2      5    2018-01-01     2    93.60
3      6    2018-01-01     3    54.78
4      7    2018-01-01     4    14.35
5      8    2018-01-01     5    18.60


df["Date"] = pd.to_datetime(df["Date"], unit="ns")
df["Hour"] = pd.to_timedelta(df["Hour"])
df["Date_Hour"]=(df["Date"] + df["Hour"]).dt.strftime('%Y-%m-%d %H:%M')
df.set_index("Date_Hour")

                    sr        Date  Hour     Data
       Date_Hour                
2018-01-01 00:00     3  2018-01-01     0    51.29
2018-01-01 01:00     4  2018-01-01     1    43.59
2018-01-01 02:00     5  2018-01-01     2    93.60
2018-01-01 03:00     6  2018-01-01     3    54.78
2018-01-01 04:00     7  2018-01-01     4    14.35
2018-01-01 05:00     8  2018-01-01     5    18.60

Upvotes: 1

Charles Landau
Charles Landau

Reputation: 4265

Edit: a few answers greatly improved on my first crack at this. I'm going to leave this up since it is referenced in another answer.

Let's start with setting the dates to datetime type.

df["Date"] = pd.to_datetime(df["Date"], unit="ms")

We want to add hours to that so they need to be Timedelta.

df["Hour"] = df["Hour"].apply(lambda x: pd.Timedelta(hours=int(x)))

Now a simple sum:

df["Date_hour"] = df["Date"] + df["Hour"]

Set a datetime index the same way you would any other column of datetime dtype:

df = df.set_index("Date_hour")

Upvotes: 1

JE_Muc
JE_Muc

Reputation: 5774

In this specific case the missing leading 0 of the hour is problematic. You can avoid this importing without parsing to datetime and then using zfill before parsing to datetime:

df2 = pd.read_csv(csv_file_path)
df2.index = pd.to_datetime(df2.Date + df2.Hour.astype(str).str.zfill(2), format='%Y-%m-%d%H')

To drop the Date and Hour columns after parsing them:

df2.drop(['Date' ,'Hour'], axis=1, inplace=True)

Short explanation:
zfill will pad the numeric string with zeros up the the specified number of characters.

Of course you can also specify a date_parser for pd.read_csv, but in this case it is imho much clearer and more pythonic to parse the dates after reading the csv.

Upvotes: 1

Related Questions