sacuL
sacuL

Reputation: 51425

shifting pandas series for only some entries

I've got a dataframe that has a Time Series (made up of strings) with some missing information:

# Generate a toy dataframe:
import pandas as pd
data = {'Time': ['0'+str(i)+':15:45' for i in range(10)]}
data['Time'][4] = 'unknown'
data['Time'][8] = 'unknown'

df = pd.DataFrame(data)

# df
       Time
0  00:15:45
1  01:15:45
2  02:15:45
3  03:15:45
4   unknown
5  05:15:45
6  06:15:45
7  07:15:45
8   unknown
9  09:15:45

I would like the unknown entries to match the entry above, resulting in this dataframe:

# desired_df
       Time
0  00:15:45
1  01:15:45
2  02:15:45
3  03:15:45
4  03:15:45
5  05:15:45
6  06:15:45
7  07:15:45
8  07:15:45
9  09:15:45

What is the best way to achieve this?

Upvotes: 0

Views: 47

Answers (2)

sacuL
sacuL

Reputation: 51425

One way to do this would be using pandas' shift, creating a new column with the data in Time shifted by one, and dropping it. But there may be a cleaner way to achieve this:

# Create new column with the shifted time data
df['Time2'] = df['Time'].shift()
# Replace the data in Time with the data in your new column where necessary
df.loc[df['Time'] == 'unknown', 'Time'] = df.loc[df['Time'] == 'unknown', 'Time2']
# Drop your new column
df = df.drop('Time2', axis=1)

print(df)

       Time
0  00:15:45
1  01:15:45
2  02:15:45
3  03:15:45
4  03:15:45
5  05:15:45
6  06:15:45
7  07:15:45
8  07:15:45
9  09:15:45

EDIT: as pointed out by Zero, the new column step can be skipped altogether:

df.loc[df['Time'] == 'unknown', 'Time'] = df['Time'].shift()

Upvotes: 0

usernamenotfound
usernamenotfound

Reputation: 1580

If you're intent on working with a time series data. I would recommend converting it to a time series, and then forward filling the blanks

import pandas as pd
data = {'Time': ['0'+str(i)+':15:45' for i in range(10)]}
data['Time'][4] = 'unknown'
data['Time'][8] = 'unknown'
df.Time = pd.to_datetime(df.Time, errors = 'coerce')
df.fillna(method='ffill')

However, if you are getting this data from a csv file or something where you use pandas.read_* function you should use the na_values argument in those functions to specify unknown as a NA value

df = pd.read_csv('example.csv', na_values = 'unknown')
df = df.fillna(method='ffill')

you can also pass a list instead of the string, and it adds the words passed to already existing list of NA values

However, if you want to keep the column a string, I would recommend just doing a find and replace

df.Time = np.where(df.Time == 'unknown', df.Time.shift(),df.Time)

Upvotes: 1

Related Questions