Reputation: 963
I'm searching for a fast way to fulfill following task:
Let's say I have following dataframe:
value
index
1 'a'
2 'b'
3 'c'
4 'd'
And I want to expand it to following dataframe:
value cum_value
index
1 'a' []
2 'b' ['a']
3 'c' ['a', 'b']
4 'd' ['a', 'b', 'c']
What is the most performant way to solve my problem?
Upvotes: 1
Views: 562
Reputation: 4618
df['cum_value'] = df['value'].cumsum().apply(lambda char: [c for c in char]).shift()
df.at[0,'cum_value']=[]
EDIT - thanks for comment Jab:
df['cum_value'] = df['value'].cumsum().apply(list).shift()
df.at[0,'cum_value']=[]
Upvotes: 1
Reputation: 1167
Convert the column to a list of values and shift. This causes the first element to become NaN, but we can use df.at to change this value to an empty list.
df = pd.DataFrame(['a', 'bb', 'hi mom', 'this is a test'])
df[1] = df[0].apply(lambda x: [x]).shift()
df.at[0,1] = []
df[1] = df[1].cumsum()
print(df)
0 1
0 a []
1 bb [a]
2 hi mom [a, bb]
3 this is a test [a, bb, hi mom]
Upvotes: 1
Reputation: 323226
Here is one way to match your output adding one sep do not include in your string type columns
s = (df.value+'~').shift().fillna('').cumsum().str[:-1].str.split('~')
index
1 []
2 [a]
3 [a, b]
4 [a, b, c]
Name: value, dtype: object
df['New'] = s
Upvotes: 3