Reputation:
I have a Dataframe as shown Below, I have to duplicate rows using the comma as a delimiter. It's easier to understand once you see the dataframes below!:
ID Fruit
10000 Apple, Orange, Pear
10001 Apple, Banana
I want to Dataframe below:
ID Fruit
10000 Apple
10000 Orange
10000 Pear
10001 Apple
10001 Banana
Upvotes: 0
Views: 41
Reputation: 13407
Try:
df['Fruit']=df['Fruit'].str.split(", ")
df=df.explode('Fruit')
Outputs:
ID Fruit
0 10000 Apple
0 10000 Orange
0 10000 Pear
1 10001 Apple
1 10001 Banana
Upvotes: 1
Reputation: 7635
If df
looks like this:
>>> df = pd.DataFrame({'ID': [10000, 10001], 'Fruit': ['Apple, Orange, Pear', 'Apple, Banana']})
>>> print(df)
ID Fruit
0 10000 Apple, Orange, Pear
1 10001 Apple, Banana
you can use the pandas.DataFrame.apply()
method to make a new column of lists consisting of dictionaries with new rows. And after that, you can concatenate these lists in order to make a new data frame out of them. The code is following:
>>> df['new'] = df.apply(lambda row: [{'ID': row.ID, 'Fruit': item} for item in row.Fruit.split(', ')], axis=1)
>>> df_new = pd.DataFrame(df.new.sum())
>>> print(df_new)
ID Fruit
0 10000 Apple
1 10000 Orange
2 10000 Pear
3 10001 Apple
4 10001 Banana
Upvotes: 0