johnchase
johnchase

Reputation: 13705

pandas get counts based on previous column

I would like to add column to a pandas dataframe where the value is an incrementing value starting with a value from another column. For instance say I have the following dataframe.

df = pd.DataFrame([['a', 1], ['a', 1], ['b', 5], ['c', 10], ['c', 10], ['c', 10]], columns=['x', 'y'])
df

    x   y
0   a   1
1   a   1
2   b   5
3   c   10
4   c   10
5   c   10

Is there some pandas functionality that would return a series that is an increasing value for each group? in other words 'a' would start with 1, 'b' with 5 and 'c' with 10. The output series would be (1, 2, 5, 10, 11, 12) so it could be added to the original dataframe like so:

    x   y   z
0   a   1   1
1   a   1   2
2   b   5   5
3   c   10  10
4   c   10  11
5   c   10  12

I tried the following:

z = []
for start, length in zip(df.y.unique(), df.groupby('x').agg('count')['y']):
    z.append(list(range(start, length + start)))
np.array(z).flatten()
z

[[1, 2], [5], [10, 11, 12]]

This doesn't quite get what I need, I'm not sure why the array does not flatten and it seems overly complex for a seemingly simple task.

EDIT: The solution should be extendable to more complex dataframes as well, for instance:

df = pd.DataFrame([['a', 1], ['b', 5], ['c', 10], ['d', 5]], columns=['x', 'y'])
df = df.append([df]*(50),ignore_index=True)

Where both the 'a' and 'b' values in column 'x' are eqaul to 5. In both of those instances the counting should start at 5

Upvotes: 1

Views: 65

Answers (3)

Alex Petralia
Alex Petralia

Reputation: 1770

Here is a way uglier method compared to @piRSquared's:

def func(group):
    x = group['y'].head(1).values
    l = []
    for i in range(len(group)):
        l.append(x+i)
    return pd.Series(l, name='z')

x = df.groupby('x').apply(func).reset_index().drop('level_1', axis=1)
x['z'] = x['z'].apply(lambda x: x[0])
pd.concat([df, x['z']], axis=1)

Upvotes: 1

Ogi Moore
Ogi Moore

Reputation: 131

While not a pandas related answer, to get out of the nested lists, and flatten it out, you can use a simple list comprehension from what you currently have as z.

>>>z = [[1, 2], [5], [10, 11, 12]]
>>>z_flat = [num for sublist in z for num in sublist])
>>>z_flat
[1, 2, 5, 10, 11, 12]

EDIT: of for a faster conversion, you can use itertools.chain()

In [5]: import itertools 

In [6]: z
Out[6]: [[1, 2], [5], [10, 11, 12]]

In [7]: merged = list(itertools.chain(*z))

In [8]: merged
Out[8]: [1, 2, 5, 10, 11, 12]

Upvotes: 1

piRSquared
piRSquared

Reputation: 294258

try:

df['z'] = df.y + df.groupby('y').apply(lambda df: pd.Series(range(len(df)))).values

Upvotes: 3

Related Questions