Visiony10
Visiony10

Reputation: 25

Get string before a whitespace on a Pandas series in a dataframe

I am preparing data for plotting but im currently encountering issues on applying functions on dataframes in Pandas

This is my dataframe:

What I need to do is to get only the date from the timestamp. So in the current dataframe, the expected result should look like this:

             timestamp    action
0           2020-03-03 pagevisit
1           2020-03-03 pagevisit
2           2020-03-03 pagevisit
3           2020-03-03 pagevisit
4           2020-03-03 pagevisit

I have around 100,000 records that I need to clean and get only the date. I tried

df['timestamp'] = df['timestamp'].apply(lambda x: x.split(' ')[0])

And it returns error

AttributeError: 'Timestamp' object has no attribute 'split'

-- I also tried

df['timestamp'] = df.apply(lambda x: x['timestamp'].split(' ')[0])

But it returns

return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 111, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: 'timestamp'

I feel that this is a fairly easy task but I have already checked for the past hour but still can't get it. My pandas ver is 1.0.1 so I honestly do not know the cause and I am already desperate. Please help.

Upvotes: 0

Views: 274

Answers (2)

Yosua
Yosua

Reputation: 421

Looking at the error, it seems that the column timestamp have type of pd.Timestamp

(check documentation: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Timestamp.html)

If you just want to get the date as string you can do as follow

df['timestamp'] = df['timestamp'].apply(lambda x: str(x.date()))

(or you can just use x.date() to get the datetime.date type )

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Use .date()

Ex:

df['timestamp'] = df['timestamp'].date()

Demo:

print(pd.Timestamp('2020-03-03 12:13:56+09:00').date())
# -->2020-03-03

Upvotes: 1

Related Questions