CSV - Split multiple-line cell into multiple cells

Question

I’m currently doing some big data work. I have an issue in a .CSV where I need to split a multiple-line single-celled chunk of text, into individual cells. The below table shows the desired output. Currently, all of the 'ingredients' are in the same cell, with each ingredient on its own new line (Stack Overflow wouldn't allow me to create new lines in the same cell).

I need to write a script to split this single cell of ingredients into the below output, using each new line in the cell as a delimiter. The real use case I'm using this for is much more complex - over 200 'items', and anywhere between 50-150 'ingredients' per 'item'. I'm currently doing this manually in excel with a series of text to columns & transpose pastes, but it takes approximately 2-2.5 full work days to do.

Link to data

Code below

Item	Ingredients
Coffee	Coffee beans
	Milk
	Sugar
	Water

import pandas as pd

df = pd.read_csv(r'd:\Python\menu.csv', delimiter=';', header=None)
headers = ["Item", "Ingredients"]
df.columns = headers
df["Ingredients"]=df["Ingredients"].str.split("
")
df = df.explode("Ingredients").reset_index(drop=True)
df.to_csv(r"D:\Python\output.csv")

Zach Young · Accepted Answer

Here's how to do it with Python's standard csv^1 ^2 module:

import csv

writer = csv.writer(open('output.csv', 'w', newline=''))

reader = csv.reader(open('input.csv', newline=''))

writer.writerow(next(reader))  # copy header

for row in reader:
    item  = row[0]
    ingredients = row[1].split('
')

    first_ingredient = ingredients[0]

    writer.writerow([item, first_ingredient])

    for ingredient in ingredients[1:]:
        writer.writerow([None, ingredient])  # None for a blank cell (under the item)

Given your small sample, I get this:

Item	Ingredients
Coffee	Coffee beans
	Milk
	Sugar
	Water

CSV - Split multiple-line cell into multiple cells

Answers (2)

Output

Related Questions