nt8
nt8

Reputation: 3

Aggregating Values for a specific ID in a Dataframe in Python

If I have a dataframe such as

id     quantity     date
 1     2.0          6-12-18
 1     3.0          6-20-18
 1     3.0          6-22-18
 1     1.0          5-12-18
 2     5.0          6-10-18
 2     1.0          6-15-18
 2     1.0          6-11-18
 3     4.0          7-10-18
 3     4.0          7-15-18
 3     4.0          7-16-18

I want to find the deviation for the 'quantity' column's values associated with a specific id.

I was thinking that I could aggregate the quantity values associated with a specific ID and order the quantity values by date, and eliminate duplicates from the list of integers created for each ID. My idea was to use df.groupby and maybe pd.series.unique

The goal is that it looks like this:

id     quantity                 date
 1     1.0, 3.0, 3.0, 2.0       5-12-18, 6-12-18, 6-20-18, 6-22-18 
 2     5.0, 1.0, 1.0            6-10-18, 6-11-18, 6-15-18
 3     4.0, 4.0, 4.0            7-10-18, 7-15-18, 7-16-18       

and then I wanted to create a new column within the dataframe where it would state whether the value in quantity increased, decreased, or stayed the same so it would look like this:

id     quantity                 trend
 1     1.0, 3.0, 3.0, 2.0       inc, same, dec 
 2     5.0, 1.0, 1.0            dec, same
 3     4.0, 4.0, 4.0            same 

thanks :)

Upvotes: 0

Views: 195

Answers (1)

gyoza
gyoza

Reputation: 2152

Input (df)

   id  quality       date
0   1      2.0 2018-06-12
1   1      3.0 2018-06-20
2   1      3.0 2018-06-22
3   1      1.0 2018-05-12
4   2      5.0 2018-06-10
5   2      1.0 2018-06-15
6   2      1.0 2018-06-11
7   3      4.0 2018-07-10
8   3      4.0 2018-07-15
9   3      4.0 2018-07-16

Code

# date column (lists)
df0 = df.groupby('id')['date'].apply(list).reset_index(drop=False)

# quality column (lists)
df1 = df.groupby('id')['quality'].apply(list).reset_index(drop=False)

# trend column (lists)
df['delta'] = df.quality.diff(1)
df.loc[df.delta > 0, 'trend'] = 'inc'
df.loc[df.delta == 0, 'trend'] = 'same'
df.loc[df.delta < 0, 'trend'] = 'dec'
df2 = df.groupby('id')['trend'].apply(list).apply(lambda x: x[1:]).reset_index(drop=False)

# merge all
df3 = pd.merge(df1, df0, on='id', how='left')
df3 = pd.merge(df3, df2, on='id', how='left')

# remove brackets
df3['quality'] = df3.quality.apply(lambda x: ", ".join(repr(e) for e in x))
df3['date'] = df3.date.apply(lambda x: ", ".join(x))
df3['trend'] = df3.trend.apply(lambda x: ", ".join(x))

Output (df3)

    id  quality             date                    trend
0   1   2.0, 3.0, 3.0, 1.0  6-12-18, 6-20-18, ...   inc, same, dec
1   2   5.0, 1.0, 1.0       6-10-18, 6-15-18, ...   dec, same
2   3   4.0, 4.0, 4.0       7-10-18, 7-15-18, ...   same, same

Upvotes: 1

Related Questions