gospecomid12
gospecomid12

Reputation: 1012

CSV file dump to yaml file in python

I'm trying to dump a .csv file into a .yml file and have succedeed. Only thing is that the syntax in the .yml file is not how I want it.

My .csv file:

NAME,KEYWORDS
Adam,Football Hockey

Where I read the .csv file and dump it into a .yml file:

import csv
import pandas
import yaml

""" Reading whole csv file with panda library """
df = pandas.read_csv('keywords.csv')


""" Dump DataFrame into getData.yml as yaml code """
with open('getData.yml', 'w') as outfile:
    yaml.dump(
        df.to_dict(orient='records'),
        outfile,
        sort_keys=False,
        width=72, 
        indent=4
    )

How the .yml output looks:

-   NAME: Adam
    KEYWORDS: Football Hockey

How I want it to look:

-   NAME: Adam
    KEYWORDS: Football, Hockey

I want to have a comma between Football and Hockey. But if I put that in the .csv file it will all look weird because everything is separated by comma from the first place. How can i do this?

Upvotes: 0

Views: 3088

Answers (3)

frogcoder
frogcoder

Reputation: 1003

The accepted answer is perfectly good. It seems the task is converting a csv file into yaml. If that is the case, the pandas library is not really necessary, as the built-in csv module can read csv files.

import csv
import yaml

with open('keywords.csv') as f:
    reader = csv.reader(f)
    next(reader) # skip header
    name_keywords = [ {'NAME': n, 'KEYWORDS': ', '.join(k.split())}
                      for n, k in reader ]

""" Dump DataFrame into getData.yml as yaml code """
with open('getData.yml', 'w') as outfile:
    yaml.dump(
        name_keywords,
        outfile,
        sort_keys=False,
        width=72, 
        indent=4
    )

Upvotes: 0

ronpi
ronpi

Reputation: 490

You have 2 options for that:

In a CSV file, if a comma is within quotes, then it won't be considered as a delimiter during parsing. This way, your CSV file would looks as follows:

NAME,KEYWORDS
Adam,"Football, Hockey"

Alternatively, you can process the KEYWORDS column after reading it. This would add the following to your code:

df = pandas.read_csv('keywords.csv')
df["KEYWORDS"] = df["KEYWORDS"].apply(lambda x: ", ".join(x.split()))

Upvotes: 1

Oddaspa
Oddaspa

Reputation: 888

I reproduced your dataframe with:

df = pd.read_csv(io.StringIO(
"""
NAME,KEYWORDS
Adam,Football Hockey
"""
), sep=",")

I assume that there can be multiple keywords each separated with a space. To insert commas you can use the apply() method that pandas provides.

df.KEYWORDS = df.KEYWORDS.apply(lambda k: k.replace(" ", ", "))

Then run the rest of your code to produce the desired outcome.

Upvotes: 0

Related Questions