Reputation: 161
I stuck with the problem how to divide a pandas dataframe by row,
I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,
Color Shape Price
0 Green Rectangle\r\nTriangle\r\nOctangle 10
1 Blue Rectangle\r\nTriangle 15
I need to divide this cell into several cells with the same values as other columns, e.g.
Color Shape Price
0 Green Rectangle 10
1 Green Triangle 10
2 Green Octangle 10
3 Blue Rectangle 15
4 Blue Tringle 15
How do I do it well?
Upvotes: 16
Views: 30997
Reputation: 384
This might not be the most efficient way to do it but I can confirm that it works with the sample df:
data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])
for index, row in df.iterrows():
split = row['Shape'].split('\r\n')
for shape in split:
new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))
new_df = new_df.reset_index(drop=True)
print(new_df)
Output:
Color Price Shape
0 Green 10 Rectangle
1 Green 10 Triangle
2 Green 10 Octangle
3 Blue 15 Rectangle
4 Blue 15 Triangle
Upvotes: 4
Reputation: 150815
As commented, str.split()
followed by explode
is helpful. If you are not on Pandas 0.25, then you can use melt
afterward:
(pd.concat( (df.Shape.str.split('\r\n', expand=True),
df[['Color','Price']]),
axis=1)
.melt(id_vars=['Color', 'Price'], value_name='Shape')
.dropna()
)
Output:
Color Price variable Shape
0 Green 10 0 Rectangle
1 Blue 15 0 Rectangle
2 Green 10 1 Triangle
3 Blue 15 1 Triangle
4 Green 10 2 Octangle
Upvotes: 2
Reputation: 4821
First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode
to unpack the list and create new rows for each of them
df["Shape"] = df.Shape.str.split()
df.explode("Shape")
Upvotes: 3
Reputation: 13426
You can do:
df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))
Output:
Color Shape Price
0 Green Rectangle 10
1 Green Triangle 10
2 Green Octangle 10
3 Blue Rectangle 15
4 Blue Triangle 15
Upvotes: 17