Reputation: 248
I want to reduce my data. My initial dataframe looks as follows:
index | time [hh:mm:ss] | value1 | value2 |
---|---|---|---|
0 | 0 days 00:00:00.000000 | 3 | 4 |
1 | 0 days 00:00:04.000000 | 5 | 2 |
2 | 0 days 00:02:02.002300 | 7 | 9 |
3 | 0 days 00:02:03.000000 | 9 | 7 |
4 | 0 days 03:02:03.000000 | 4 | 3 |
Now I want to reduce my data in order to only keep the cells of every new minute (respectively also new hour and days). the other way around: only the first line of a new minute should be kept. all remaining lines of this minute should be dropped. So the resulting table looks as follows:
index | time | value1 | value2 |
---|---|---|---|
0 | 0 days 00:00:00.000000 | 3 | 4 |
2 | 0 days 00:02:02.002300 | 7 | 9 |
4 | 0 days 03:02:03.000000 | 4 | 3 |
Any ideas how to approach this?
Upvotes: 0
Views: 1134
Reputation: 1734
d = '''index,time,value1,value2 0,0 days 00:00:00.000000,3,4 1,0 days 00:00:04.000000,5,2 2,0 days 00:02:02.002300,7,9 3,0 days 00:02:03.000000,9,7 4,0 days 03:02:03.000000,4,3'''
df = pd.read_csv(StringIO(d),parse_dates=True)
df
df['time1'] = pd.to_datetime(df['time'].str.slice(7)) df.set_index('time1',inplace=True)
df
df.groupby([df.index.hour,df.index.minute]).head(1).sort_index().reset_index(drop=True)
Upvotes: 0
Reputation: 863741
There is used timedeltas so is possible create TimedeltaIndex
and use DataFrame.resample
by 1Minute
with Resampler.first
, only are added all minutes, so removed only NaNs rows:
df.index = pd.to_timedelta(df['time [hh:mm:ss]'])
df = df.resample('1Min').first().dropna(how='all').reset_index(drop=True)
print (df)
time [hh:mm:ss] value1 value2
0 0 days 00:00:00.000000 3.0 4.0
1 0 days 00:02:02.002300 7.0 9.0
2 0 days 03:02:03.000000 4.0 3.0
Upvotes: 3
Reputation: 176
You could extract the D:HH:MM using apply and multiple splits, and then delete the duplicates, choosing the first value.
dms = df['time [hh:mm:ss]'].apply(lambda x: ':'.join( [x.split(' days ')[0], *x.split('days ')[1].split(':')[:2]]) )
df.iloc[dms.drop_duplicates().index]
Upvotes: 0