Reputation: 3
If I have a dataframe such as
id quantity date
1 2.0 6-12-18
1 3.0 6-20-18
1 3.0 6-22-18
1 1.0 5-12-18
2 5.0 6-10-18
2 1.0 6-15-18
2 1.0 6-11-18
3 4.0 7-10-18
3 4.0 7-15-18
3 4.0 7-16-18
I want to find the deviation for the 'quantity' column's values associated with a specific id.
I was thinking that I could aggregate the quantity values associated with a specific ID and order the quantity values by date, and eliminate duplicates from the list of integers created for each ID. My idea was to use df.groupby and maybe pd.series.unique
The goal is that it looks like this:
id quantity date
1 1.0, 3.0, 3.0, 2.0 5-12-18, 6-12-18, 6-20-18, 6-22-18
2 5.0, 1.0, 1.0 6-10-18, 6-11-18, 6-15-18
3 4.0, 4.0, 4.0 7-10-18, 7-15-18, 7-16-18
and then I wanted to create a new column within the dataframe where it would state whether the value in quantity increased, decreased, or stayed the same so it would look like this:
id quantity trend
1 1.0, 3.0, 3.0, 2.0 inc, same, dec
2 5.0, 1.0, 1.0 dec, same
3 4.0, 4.0, 4.0 same
thanks :)
Upvotes: 0
Views: 195
Reputation: 2152
Input (df
)
id quality date
0 1 2.0 2018-06-12
1 1 3.0 2018-06-20
2 1 3.0 2018-06-22
3 1 1.0 2018-05-12
4 2 5.0 2018-06-10
5 2 1.0 2018-06-15
6 2 1.0 2018-06-11
7 3 4.0 2018-07-10
8 3 4.0 2018-07-15
9 3 4.0 2018-07-16
Code
# date column (lists)
df0 = df.groupby('id')['date'].apply(list).reset_index(drop=False)
# quality column (lists)
df1 = df.groupby('id')['quality'].apply(list).reset_index(drop=False)
# trend column (lists)
df['delta'] = df.quality.diff(1)
df.loc[df.delta > 0, 'trend'] = 'inc'
df.loc[df.delta == 0, 'trend'] = 'same'
df.loc[df.delta < 0, 'trend'] = 'dec'
df2 = df.groupby('id')['trend'].apply(list).apply(lambda x: x[1:]).reset_index(drop=False)
# merge all
df3 = pd.merge(df1, df0, on='id', how='left')
df3 = pd.merge(df3, df2, on='id', how='left')
# remove brackets
df3['quality'] = df3.quality.apply(lambda x: ", ".join(repr(e) for e in x))
df3['date'] = df3.date.apply(lambda x: ", ".join(x))
df3['trend'] = df3.trend.apply(lambda x: ", ".join(x))
Output (df3
)
id quality date trend
0 1 2.0, 3.0, 3.0, 1.0 6-12-18, 6-20-18, ... inc, same, dec
1 2 5.0, 1.0, 1.0 6-10-18, 6-15-18, ... dec, same
2 3 4.0, 4.0, 4.0 7-10-18, 7-15-18, ... same, same
Upvotes: 1