Reputation: 341
I have a data frame like below.
unit time s1 s2 ....
1 1 2 3
1 2 4 5
1 3 9 7
2 1 5 2
2 2 3 1
I would like to group data by units, keep the minimum similar number of last observations based on time (unit 2 has 2 observations), and make a separate group for the s1 column. So, something like below.
unit_1 unit_2
4 5
9 3
Thank you.
Upvotes: 0
Views: 441
Reputation: 19307
This should solve your problem -
def f(col):
#First step is to get the last 2 for each group using .tail(2)
dff = df[['unit','time',col]].sort_values(by=['unit','time'],axis=0).groupby(['unit']).tail(2)
#Next we need the ordered rank of the time values instead of the actual values of time,
#since then we can keep the time values 2,3 as 1,2 and 1,2 as 1,2.
dff['time'] = dff.groupby(['unit']).rank()
#Last we pivot over the time and units to get the columns that you need for correlation analysis
dff = dff.pivot(index='time',columns='unit',values=col).reset_index(drop=True).add_prefix('unit_')
return dff
f('s1')
unit unit_1 unit_2
0 4 5
1 9 3
Use this function for a faster runtime.
def f(col):
filt = df[['unit',col]].groupby('unit').tail(2) #filter last 2
filt['count'] = filt.groupby('unit').cumcount() #add a counter column for pivot
#Use counter column as index and unit as column for pivot, then add prefix
filt = filt.pivot(index='count',columns='unit',values=col).reset_index(drop=True).add_prefix("unit_")
return filt
Upvotes: 2
Reputation: 26686
Groupby
unit and pass the list of the nth
values.Drop unwanted columns. Transpose dataframe and add prefix unit to the names. Transpose and ravel to combine columns
g= df.groupby('unit', group_keys=False).nth([-1,-2]).drop(columns=['time','s2']).T.add_prefix('unit_')#.unstack('s1')
final = pd.DataFrame({'unit_1': g['unit_1'].values.T.ravel(),
'unit_2': g['unit_2'].values.T.ravel()})
final
unit_1 unit_2
0 4 5
1 9 3
Upvotes: 1
Reputation: 195
So, I made this solution:
import pandas as pd
import numpy as np
df = pd.DataFrame({'units': [1,1,1,2,2], 's1':[2,4,9,5,3]})
new_df = df.groupby('units').tail(2) # Taking the last 2 values
new_df
Out:
units s1
1 1 4
2 1 9
3 2 5
4 2 3
units_list = new_df.units.unique() # How many units do we have?
units_columns = [] # For col names
form_dict = {}
# We have 2 values for each unit, so the number of elements is 2n,
# where n is a number of unit corresponding the new_df.
n = 0
for unit in units_list:
units_columns.append('unit_{}'.format(unit))
while n != len(new_df['s1']):
for col in units_columns:
form_dict.update({col:new_df['s1'][n:n+2].values})
n += 2
final_df = pd.DataFrame(form_dict)
final_df
And the result is:
unit_1 unit_2
0 4 5
1 9 3
Upvotes: 1