Reputation: 327

Add Pandas column value based on ID

I have a following requirement with a sample csv file as below,

colA   colB    NewCol
 1     val1     val1
 1     val2     val1 + val2
 1     val3     val1 + val2 + val3
 2     val4     val4
 2     val5     val4 + val5
 2     val6     val4 + val5 + val6
 3     val7     val7
 3     val8     val7 + val8
 3     val9     val7 + val8 + val9

I want to add a new column to df, such that it add successive colB values only for same ID value. Col B can be Integer or String.

Upvotes: 2

Answers (2)

jezrael

Reputation: 863631

Use np.cumsum or pd.Series.cumsum what perfectly working with numeric and strings:

df['NewCol'] = df.groupby('colA')['colB'].apply(np.cumsum)

Or:

df['NewCol'] = df.groupby('colA')['colB'].apply(pd.Series.cumsum)

print (df)
   colA  colB        NewCol
0     1  val1          val1
1     1  val2      val1val2
2     1  val3  val1val2val3
3     2  val4          val4
4     2  val5      val4val5
5     2  val6  val4val5val6
6     3  val7          val7
7     3  val8      val7val8
8     3  val9  val7val8val9

Upvotes: 1

jpp

Reputation: 164843

Using groupby + cumsum:

df['NewCol'] = df.groupby('colA')['colB'].cumsum()

For summing strings, you can use a list comprehension with groupby:

df = df.sort_values('colA')

df['C'] = [''.join(g.iloc[:i+1].values) \
           for _, g in df.groupby('colA')['colB'] \
           for i in range(len(g.index))]

print(df)

   colA  colB             C
0     1  val1          val1
1     1  val2      val1val2
2     1  val3  val1val2val3
3     2  val4          val4
4     2  val5      val4val5
5     2  val6  val4val5val6
6     3  val7          val7
7     3  val8      val7val8
8     3  val9  val7val8val9

Explanation

First iterate groupby elements via for _, g in df.groupby(...)[...]. We ignore the index _ here as we are only interested in the values.
Then iterate range(len(g.index)), i.e. iterate over the number of items in the group.
Apply ''.join on the first i values. We use g.iloc[:i+1] to extract the first i values in a group.
Wrap all the above in a nested list comprehension.

Upvotes: 2

Add Pandas column value based on ID

Answers (2)

Related Questions