Aggregating Values for a specific ID in a Dataframe in Python

Question

If I have a dataframe such as

id     quantity     date
 1     2.0          6-12-18
 1     3.0          6-20-18
 1     3.0          6-22-18
 1     1.0          5-12-18
 2     5.0          6-10-18
 2     1.0          6-15-18
 2     1.0          6-11-18
 3     4.0          7-10-18
 3     4.0          7-15-18
 3     4.0          7-16-18

I want to find the deviation for the 'quantity' column's values associated with a specific id.

I was thinking that I could aggregate the quantity values associated with a specific ID and order the quantity values by date, and eliminate duplicates from the list of integers created for each ID. My idea was to use df.groupby and maybe pd.series.unique

The goal is that it looks like this:

id     quantity                 date
 1     1.0, 3.0, 3.0, 2.0       5-12-18, 6-12-18, 6-20-18, 6-22-18 
 2     5.0, 1.0, 1.0            6-10-18, 6-11-18, 6-15-18
 3     4.0, 4.0, 4.0            7-10-18, 7-15-18, 7-16-18

and then I wanted to create a new column within the dataframe where it would state whether the value in quantity increased, decreased, or stayed the same so it would look like this:

id     quantity                 trend
 1     1.0, 3.0, 3.0, 2.0       inc, same, dec 
 2     5.0, 1.0, 1.0            dec, same
 3     4.0, 4.0, 4.0            same

thanks :)

gyoza · Accepted Answer

Input (df)

   id  quality       date
0   1      2.0 2018-06-12
1   1      3.0 2018-06-20
2   1      3.0 2018-06-22
3   1      1.0 2018-05-12
4   2      5.0 2018-06-10
5   2      1.0 2018-06-15
6   2      1.0 2018-06-11
7   3      4.0 2018-07-10
8   3      4.0 2018-07-15
9   3      4.0 2018-07-16

Code

# date column (lists)
df0 = df.groupby('id')['date'].apply(list).reset_index(drop=False)

# quality column (lists)
df1 = df.groupby('id')['quality'].apply(list).reset_index(drop=False)

# trend column (lists)
df['delta'] = df.quality.diff(1)
df.loc[df.delta > 0, 'trend'] = 'inc'
df.loc[df.delta == 0, 'trend'] = 'same'
df.loc[df.delta < 0, 'trend'] = 'dec'
df2 = df.groupby('id')['trend'].apply(list).apply(lambda x: x[1:]).reset_index(drop=False)

# merge all
df3 = pd.merge(df1, df0, on='id', how='left')
df3 = pd.merge(df3, df2, on='id', how='left')

# remove brackets
df3['quality'] = df3.quality.apply(lambda x: ", ".join(repr(e) for e in x))
df3['date'] = df3.date.apply(lambda x: ", ".join(x))
df3['trend'] = df3.trend.apply(lambda x: ", ".join(x))

Output (df3)

    id  quality             date                    trend
0   1   2.0, 3.0, 3.0, 1.0  6-12-18, 6-20-18, ...   inc, same, dec
1   2   5.0, 1.0, 1.0       6-10-18, 6-15-18, ...   dec, same
2   3   4.0, 4.0, 4.0       7-10-18, 7-15-18, ...   same, same

Aggregating Values for a specific ID in a Dataframe in Python

Answers (1)

Related Questions