Reputation: 69
I have an excel file that I have read into python as a dataframe as shown.
Store List
Ralphs bag1
apple
pear
Ralphs bag2
toilet paper
Albertsons bag3
magazines
paper
I am attempting to expand the second column into two columns as shown.
Bag Item
bag1 Apple
bag1 Pear
bag2 Toilet Paper
bag3 Magazines
bag3 paper
The store column has Na's whenever an item is listed in the original table. Is there some pandas function that has a seperate function based on the Na's? or would it be easier to create a Series with all the bags and use that as some cutting point?
Upvotes: 1
Views: 349
Reputation: 14689
You can add a column that assign to each item its bag then groupby bag id
as follows:
df['bag'] = "bag"+df.List.str.contains('bag').cumsum().astype(str)
df[['bag','List']].groupby('bag').apply(lambda x: x[1:]).reset_index(drop=True)
or, instead of groupby
, filter out the elements where df.List == df.bag
(which will be bag1 == bag1
, bag2 == bag2
.. )
df['bag'] = "bag"+df.List.str.contains('bag').cumsum().astype(str)
df[df.List != df.bag].drop('Store', axis=1)
Overall execution
In [83]: df
Out[83]:
Store List
0 Ralphs bag1
1 NaN apple
2 NaN pear
3 Ralphs bag2
4 NaN toilet paper
5 Albertsons bag3
6 NaN magazines
7 Nan paper
In [84]: df['bag'] = "bag"+df.List.str.contains('bag').cumsum().astype(str)
In [85]: df[['bag','List']].groupby('bag').apply(lambda x: x[1:]).reset_index(drop=True)
Out[85]:
bag List
0 bag1 apple
1 bag1 pear
2 bag2 toilet paper
3 bag3 magazines
4 bag3 paper
Upvotes: 0
Reputation: 402553
Setup
df
Store List
0 Ralphs bag1
1 Ralphs apple
2 Ralphs pear
3 Ralphs bag2
4 Ralphs toilet paper
5 Albertsons bag3
6 Albertsons magazines
7 Albertsons paper
You can use df.where
with str.contains
, extracting all rows which start with bag
, and then filter out spurious rows.
df = df.assign(Bag=df.List.\
where(df.List.str.contains('bag')).ffill()).drop('Store', 1)
df[df.List != df.Bag]
List Bag
1 apple bag1
2 pear bag1
4 toilet paper bag2
6 magazines bag3
7 paper bag3
Upvotes: 1