Reputation: 13705
I would like to add column to a pandas dataframe where the value is an incrementing value starting with a value from another column. For instance say I have the following dataframe.
df = pd.DataFrame([['a', 1], ['a', 1], ['b', 5], ['c', 10], ['c', 10], ['c', 10]], columns=['x', 'y'])
df
x y
0 a 1
1 a 1
2 b 5
3 c 10
4 c 10
5 c 10
Is there some pandas functionality that would return a series that is an increasing value for each group? in other words 'a'
would start with 1
, 'b'
with 5
and 'c'
with 10
. The output series would be (1, 2, 5, 10, 11, 12)
so it could be added to the original dataframe like so:
x y z
0 a 1 1
1 a 1 2
2 b 5 5
3 c 10 10
4 c 10 11
5 c 10 12
I tried the following:
z = []
for start, length in zip(df.y.unique(), df.groupby('x').agg('count')['y']):
z.append(list(range(start, length + start)))
np.array(z).flatten()
z
[[1, 2], [5], [10, 11, 12]]
This doesn't quite get what I need, I'm not sure why the array does not flatten and it seems overly complex for a seemingly simple task.
EDIT: The solution should be extendable to more complex dataframes as well, for instance:
df = pd.DataFrame([['a', 1], ['b', 5], ['c', 10], ['d', 5]], columns=['x', 'y'])
df = df.append([df]*(50),ignore_index=True)
Where both the 'a'
and 'b'
values in column 'x' are eqaul to 5. In both of those instances the counting should start at 5
Upvotes: 1
Views: 65
Reputation: 1770
Here is a way uglier method compared to @piRSquared's:
def func(group):
x = group['y'].head(1).values
l = []
for i in range(len(group)):
l.append(x+i)
return pd.Series(l, name='z')
x = df.groupby('x').apply(func).reset_index().drop('level_1', axis=1)
x['z'] = x['z'].apply(lambda x: x[0])
pd.concat([df, x['z']], axis=1)
Upvotes: 1
Reputation: 131
While not a pandas related answer, to get out of the nested lists, and flatten it out, you can use a simple list comprehension from what you currently have as z.
>>>z = [[1, 2], [5], [10, 11, 12]]
>>>z_flat = [num for sublist in z for num in sublist])
>>>z_flat
[1, 2, 5, 10, 11, 12]
EDIT: of for a faster conversion, you can use itertools.chain()
In [5]: import itertools
In [6]: z
Out[6]: [[1, 2], [5], [10, 11, 12]]
In [7]: merged = list(itertools.chain(*z))
In [8]: merged
Out[8]: [1, 2, 5, 10, 11, 12]
Upvotes: 1
Reputation: 294258
try:
df['z'] = df.y + df.groupby('y').apply(lambda df: pd.Series(range(len(df)))).values
Upvotes: 3