George
George

Reputation: 161

split rows in pandas dataframe

I stuck with the problem how to divide a pandas dataframe by row,

I have a similar dataframe with a column where values are separated by \r\n and they are in one cell,

    Color                              Shape  Price
0  Green  Rectangle\r\nTriangle\r\nOctangle     10
1   Blue              Rectangle\r\nTriangle     15 

I need to divide this cell into several cells with the same values as other columns, e.g.

   Color      Shape  Price
0  Green  Rectangle     10
1  Green   Triangle     10
2  Green   Octangle     10
3   Blue  Rectangle     15
4   Blue    Tringle     15

How do I do it well?

Upvotes: 16

Views: 30997

Answers (4)

MBA Coder
MBA Coder

Reputation: 384

This might not be the most efficient way to do it but I can confirm that it works with the sample df:

data = [['Green', 'Rectangle\r\nTriangle\r\nOctangle', 10], ['Blue', 'Rectangle\r\nTriangle', 15]]   
df = pd.DataFrame(data, columns = ['Color', 'Shape', 'Price'])
new_df = pd.DataFrame(columns = ['Color', 'Shape', 'Price'])

for index, row in df.iterrows():
    split = row['Shape'].split('\r\n')
    for shape in split:
        new_df = new_df.append(pd.DataFrame({'Color':[row['Color']], 'Shape':[shape], 'Price':[row['Price']]}))

new_df = new_df.reset_index(drop=True)
print(new_df)

Output:

   Color Price      Shape
0  Green    10  Rectangle
1  Green    10   Triangle
2  Green    10   Octangle
3   Blue    15  Rectangle
4   Blue    15   Triangle

Upvotes: 4

Quang Hoang
Quang Hoang

Reputation: 150815

As commented, str.split() followed by explode is helpful. If you are not on Pandas 0.25, then you can use melt afterward:

(pd.concat( (df.Shape.str.split('\r\n', expand=True), 
            df[['Color','Price']]),
          axis=1)
   .melt(id_vars=['Color', 'Price'], value_name='Shape')
   .dropna()
)

Output:

   Color  Price variable      Shape
0  Green     10        0  Rectangle
1   Blue     15        0  Rectangle
2  Green     10        1   Triangle
3   Blue     15        1   Triangle
4  Green     10        2   Octangle

Upvotes: 2

Darren Christopher
Darren Christopher

Reputation: 4821

First, you'll need to split the Shape by white spaces, that will give you list of shapes. Then, use df.explode to unpack the list and create new rows for each of them

df["Shape"] = df.Shape.str.split()
df.explode("Shape")

Upvotes: 3

Sociopath
Sociopath

Reputation: 13426

You can do:

df["Shape"]=df["Shape"].str.split("\r\n")
print(df.explode("Shape").reset_index(drop=True))

Output:

   Color    Shape   Price
0   Green   Rectangle   10
1   Green   Triangle    10
2   Green   Octangle    10
3   Blue    Rectangle   15
4   Blue    Triangle    15

Upvotes: 17

Related Questions