Starbucks
Starbucks

Reputation: 1568

How to insert space in Pandas Column String

This column has bad date/time information in it:

import pandas as pd
df = pd.DataFrame({
    'date': ['1/25/201612:00:00AM','2/25/201712:00:00AM','3/25/201812:00:00AM',
             '4/25/201912:00:00AM','5/25/201912:00:00AM','6/25/201912:00:00AM']})

I tried this function below, however it produces a column of NaNs:

def insert_space(string, integer):
    return string[0:integer] + ' ' + string[integer:]
insert_space(df['date'], 9)

Desired Output example (any date format works!):

    date
0   1/25/2016 12:00:00AM
1   2/25/2017 12:00:00AM
2   3/25/2018 12:00:00AM
3   4/25/2019 12:00:00AM
4   5/25/2019 12:00:00AM
5   6/25/2019 12:00:00AM

Or

    date
0   1/25/2016
1   2/25/2017 
2   3/25/2018 
3   4/25/2019 
4   5/25/2019 
5   6/25/2019

Upvotes: 2

Views: 7784

Answers (3)

PeptideWitch
PeptideWitch

Reputation: 2349

Applying a function to every row of a specified column works like this:

df['date'].apply(lambda x: insert_space(x, 9), axis=1)

Note that, if working with datetime objects, you'll need to modify the function accordingly. A datetime object, such as datetime.time(), is not subscriptable and will raise a TypeError if you try and run it through your insert_space function. str(datetime.time()) will return a string, however.

Upvotes: 3

Vadim
Vadim

Reputation: 642

As it stands, the function you provide just returns a value, which is immediately trashed.

Here is a solution using basic for loop (can be trivially converted to list comprehension or functionalized).

import pandas as pd

# First format
df = pd.DataFrame({
    'date': ['1/25/201612:00:00AM','2/25/201712:00:00AM','3/25/201812:00:00AM',
             '4/25/201912:00:00AM','5/25/201912:00:00AM','6/25/201912:00:00AM']})

for i in range(len(df)):
    df['date'][i] = df['date'][i][:-10] + " " + df['date'][i][-10:]

print(df)
#                     date
# 0  1/25/2016  12:00:00AM
# 1  2/25/2017  12:00:00AM
# 2  3/25/2018  12:00:00AM
# 3  4/25/2019  12:00:00AM
# 4  5/25/2019  12:00:00AM
# 5  6/25/2019  12:00:00AM

# Second format
df = pd.DataFrame({
    'date': ['1/25/201612:00:00AM','2/25/201712:00:00AM','3/25/201812:00:00AM',
             '4/25/201912:00:00AM','5/25/201912:00:00AM','6/25/201912:00:00AM']})

for i in range(len(df)):
    df['date'][i] = df['date'][i][:-10]

print(df)

#         date
# 0  1/25/2016
# 1  2/25/2017
# 2  3/25/2018
# 3  4/25/2019
# 4  5/25/2019
# 5  6/25/2019

UPDATE: Below are list comprehensions for the respective calls that should be far more efficient:

df['date'] = [v[:-10] + " " + v[-10:] for v in df['date']]
df['date'] = [v[:-10] for v in df['date']]

Upvotes: 0

andre
andre

Reputation: 447

how about something like this:

df['date'] = pd.to_datetime(df['date'], format="%m/%d/%Y%I:%M:%S%p")

you can find the explanation for the format here: https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

Upvotes: 0

Related Questions