jonboy
jonboy

Reputation: 366

Format timestamps in a column - Pandas

I am trying to format a column of timestamps in a df so each time point represents 0.1 of a second. Below is the function I'm using:

d = ({   
    'Time' : ['2010-07-27 09:25:31','2010-07-27 09:25:31.1000','2010-07-27 09:25:31.2000','2010-07-27 09:25:31.3000','2010-07-27 09:25:31.4000','2010-07-27 09:25:31.5000','2010-07-27 09:25:32'],
    'Value' : [np.nan,np.nan,np.nan,np.nan,-8,np.nan,-6],
   })

df = pd.DataFrame(data=d)

df['Time']= pd.to_datetime(df['Time']) 

def format_time(col):
    t = col    
    s = t.dt.strftime('%Y-%m-%d %H:%M:%S.%f')
    tail = s[-7:]
    f = round(float(tail), 3)
    temp = "%.1f" % f
    return "%s%s" % (s[:-7], temp[1:])

df['Time'] = format_time(df['Time'])

Intended Output:

                     Time  Value
0 2010-07-27 09:25:31.1      NaN
1 2010-07-27 09:25:31.2      NaN
2 2010-07-27 09:25:31.3      NaN
3 2010-07-27 09:25:31.4      NaN
4 2010-07-27 09:25:31.5     -8.0
5 2010-07-27 09:25:31.6      NaN
6 2010-07-27 09:25:31.7     -6.0

Upvotes: 2

Views: 92

Answers (2)

jezrael
jezrael

Reputation: 863166

I believe you need:

def format_time(t):
    #removed dt and t = col       
    s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
    tail = s[-7:]
    print (tail)
    f = round(float(tail), 3)
    print (f)
    temp = "%.1f" % f
    return "%s%s" % (s[:-7], temp[1:])


df['Time'] = df['Time'].apply(format_time)
print (df)
                    Time  Value
0  2010-07-27 09:25:31.0    NaN
1  2010-07-27 09:25:31.1    NaN
2  2010-07-27 09:25:31.2    NaN
3  2010-07-27 09:25:31.3    NaN
4  2010-07-27 09:25:31.4   -8.0
5  2010-07-27 09:25:31.5    NaN
6  2010-07-27 09:25:32.0   -6.0

Or Series.dt.round for round by miliseconds and then removed last 2 values of miliseconds:

df['Time'] = df['Time'].dt.round('L').astype(str).str[:-2]
print (df)
                    Time  Value
0  2010-07-27 09:25:31.0    NaN
1  2010-07-27 09:25:31.1    NaN
2  2010-07-27 09:25:31.2    NaN
3  2010-07-27 09:25:31.3    NaN
4  2010-07-27 09:25:31.4   -8.0
5  2010-07-27 09:25:31.5    NaN
6  2010-07-27 09:25:32.0   -6.0

Question is - is round necessary? I test it and depends of data, if more valuees after in %f parameter of timestamps (miliseconds), then necessary:

d = ({   
    'Time' : ['2010-07-27 09:25:31','2010-07-27 09:25:31.1000',
              '2010-07-27 09:25:31.2000','2010-07-27 09:25:31.3000',
              '2010-07-27 09:25:31.499','2010-07-27 09:25:31.5153',
              '2010-07-27 09:25:32'],
    'Value' : [np.nan,np.nan,np.nan,np.nan,-8,np.nan,-6],
   })

df = pd.DataFrame(data=d)
#print (df)

df['Time']= pd.to_datetime(df['Time']) 

def format_time(t):
    #removed dt and t = col       
    s = t.strftime('%Y-%m-%d %H:%M:%S.%f')
    tail = s[-7:]
    f = round(float(tail), 3)
    temp = "%.1f" % f
    return "%s%s" % (s[:-7], temp[1:])


df['Time0'] = df['Time'].apply(format_time)


df['Time1'] = df['Time'].dt.round('L').astype(str).str[:-2]

df['Time2'] = df.Time.astype(str).str[:-2]

print (df)
                        Time  Value                  Time0  \
0 2010-07-27 09:25:31.000000    NaN  2010-07-27 09:25:31.0   
1 2010-07-27 09:25:31.100000    NaN  2010-07-27 09:25:31.1   
2 2010-07-27 09:25:31.200000    NaN  2010-07-27 09:25:31.2   
3 2010-07-27 09:25:31.300000    NaN  2010-07-27 09:25:31.3   
4 2010-07-27 09:25:31.499000   -8.0  2010-07-27 09:25:31.5   
5 2010-07-27 09:25:31.515300    NaN  2010-07-27 09:25:31.5   
6 2010-07-27 09:25:32.000000   -6.0  2010-07-27 09:25:32.0   

                   Time1                     Time2  
0  2010-07-27 09:25:31.0  2010-07-27 09:25:31.0000  
1  2010-07-27 09:25:31.1  2010-07-27 09:25:31.1000  
2  2010-07-27 09:25:31.2  2010-07-27 09:25:31.2000  
3  2010-07-27 09:25:31.3  2010-07-27 09:25:31.3000  
4  2010-07-27 09:25:31.4  2010-07-27 09:25:31.4990  
5  2010-07-27 09:25:31.5  2010-07-27 09:25:31.5153  
6  2010-07-27 09:25:32.0  2010-07-27 09:25:32.0000  

Upvotes: 3

Sergey Bushmanov
Sergey Bushmanov

Reputation: 25209

You may achieve what you want with:

df.Time.astype(str).str[:-2]
0    2010-07-27 09:25:31.0
1    2010-07-27 09:25:31.1
2    2010-07-27 09:25:31.2
3    2010-07-27 09:25:31.3
4    2010-07-27 09:25:31.4
5    2010-07-27 09:25:31.5
6    2010-07-27 09:25:32.0
Name: Time, dtype: object

Though it's of object type now, not timestamp anymore.

Upvotes: 2

Related Questions