CunnyFunt
CunnyFunt

Reputation: 105

CSV - Split multiple-line cell into multiple cells

I’m currently doing some big data work. I have an issue in a .CSV where I need to split a multiple-line single-celled chunk of text, into individual cells. The below table shows the desired output. Currently, all of the 'ingredients' are in the same cell, with each ingredient on its own new line (Stack Overflow wouldn't allow me to create new lines in the same cell).

I need to write a script to split this single cell of ingredients into the below output, using each new line in the cell as a delimiter. The real use case I'm using this for is much more complex - over 200 'items', and anywhere between 50-150 'ingredients' per 'item'. I'm currently doing this manually in excel with a series of text to columns & transpose pastes, but it takes approximately 2-2.5 full work days to do.

Link to data

Code below

Item Ingredients
Coffee Coffee beans
Milk
Sugar
Water
import pandas as pd

df = pd.read_csv(r'd:\Python\menu.csv', delimiter=';', header=None)
headers = ["Item", "Ingredients"]
df.columns = headers
df["Ingredients"]=df["Ingredients"].str.split("\n")
df = df.explode("Ingredients").reset_index(drop=True)
df.to_csv(r"D:\Python\output.csv")

Upvotes: 1

Views: 1152

Answers (2)

Zach Young
Zach Young

Reputation: 11188

Here's how to do it with Python's standard csv^1 ^2 module:

import csv

writer = csv.writer(open('output.csv', 'w', newline=''))

reader = csv.reader(open('input.csv', newline=''))

writer.writerow(next(reader))  # copy header

for row in reader:
    item  = row[0]
    ingredients = row[1].split('\n')

    first_ingredient = ingredients[0]

    writer.writerow([item, first_ingredient])

    for ingredient in ingredients[1:]:
        writer.writerow([None, ingredient])  # None for a blank cell (under the item)

Given your small sample, I get this:

Item Ingredients
Coffee Coffee beans
Milk
Sugar
Water

Upvotes: 1

ferdy
ferdy

Reputation: 5024

Using your code and linked data change delimeter to a comma like below.

import pandas as pd

df = pd.read_csv('Inventory.csv', delimiter=',')
df["Software"]=df["Software"].str.split("\n")
df = df.explode("Software").reset_index(drop=True)

# Remove rows having empty string under Software column.
df = df[df['Software'].astype(bool)]

df = df.reset_index(drop=True)
df.to_csv("out_Inventory.csv")

print(df.to_string())

Output

        Hostname                                                                                                                Software
0    ServerName1        Windows Driver Package - Amazon Inc. (AWSNVMe) SCSIAdapter  (08/27/2019 1.3.2.53)  [version 08/27/2019 1.3.2.53]
1    ServerName1                                                                               Airlock Digital Client  [version 4.7.1.0]
2    ServerName1                                                                 AppFabric 1.1 for Windows Server  [version 1.1.2106.32]
3    ServerName1                                                                                   BlueStripe Collector  [version 8.0.3]
...

Upvotes: 2

Related Questions