Alexandre Nogard
Alexandre Nogard

Reputation: 43

Split columns into a csv with panda

Just a quick question.

I've a CSV, with a lot of columns. And I've 1 column named : Cuisine, with lot of values.

name,Cuisine
Real Talent Cafe,"Italian, American, Pizza, Mediterranean, European, Fusion"
Dogma,"International, Mediterranean, Barbecue, Spanish, Fusion"
Taberna El Callejon,"Mediterranean, European, Spanish"
Astor,"International, Mediterranean, European, Fusion"
La Gaditana Castellana,"Spanish, Seafood, International, Diner, Wine Bar"

I would like to make a new CSV from this CSV, with 2 columns : - Name - Cuisine (by splitting the 1st CSV)

This is the script I created, I select only the 2 columns how interest me : Name & Cuisine :

# -*- coding: utf-8 -*-
from itertools import chain
import numpy as np
import pandas as pd

df = pd.read_csv('res_madrid.csv', usecols=['name','Cuisine'])
items_count = df["Cuisine"].str.count(",") +1

pd.DataFrame({"name": np.repeat(df["name"], items_count),
    "Cuisine": list(chain.from_iterable(df["Cuisine"].str.split(",")))})

and i get the following error :

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 471, in repeat
    return _wrapfunc(a, 'repeat', repeats, axis=axis)
  File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
    return getattr(obj, method)(*args, **kwds)
  File "/usr/lib64/python3.6/site-packages/pandas/core/series.py", line 1157, in repeat
    new_index = self.index.repeat(repeats)
  File "/usr/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 862, in repeat
    return self._shallow_copy(self._values.repeat(repeats))
ValueError: count < 0

Please note that if you do the test, copying the datas I shared you, it will work ... The issue appear when I load a CSV file containing more columns, and I use the "usecols" parameter.

And the expected result is the following :

                     name         Cuisine
0        Real Talent Cafe         Italian
0        Real Talent Cafe        American
0        Real Talent Cafe           Pizza
0        Real Talent Cafe   Mediterranean
0        Real Talent Cafe        European
0        Real Talent Cafe          Fusion
1                   Dogma   International
1                   Dogma   Mediterranean
1                   Dogma        Barbecue
1                   Dogma         Spanish
1                   Dogma          Fusion
2     Taberna El Callejon   Mediterranean
2     Taberna El Callejon        European
2     Taberna El Callejon         Spanish
3                   Astor   International
3                   Astor   Mediterranean
3                   Astor        European
3                   Astor          Fusion
4  La Gaditana Castellana         Spanish
4  La Gaditana Castellana         Seafood
4  La Gaditana Castellana   International
4  La Gaditana Castellana           Diner
4  La Gaditana Castellana        Wine Bar

Edit : The error come because I've null values en column Cuisine. How I can avoid that ?

Thanks for your help :) Regards Alexandre

Upvotes: 1

Views: 114

Answers (3)

JoergVanAken
JoergVanAken

Reputation: 1286

If you want a solution without apply and list comprehesion you can do this:

pd.DataFrame(df.Cuisine.str.split(',').values.tolist(), index=df.Name)\
.stack().reset_index().drop('level_1', axis=1)

Upvotes: 0

SUMANTH_09
SUMANTH_09

Reputation: 36

How about this

pd.concat([Series(row['name'], row['Cuisine'].split(','))              
                for index, row in df.iterrows()]).reset_index()

Then you just have to rename the columns

Upvotes: 1

iamklaus
iamklaus

Reputation: 3770

data = pd.read_csv(#path to txt file)

data

                     name                                            Cuisine
0        Real Talent Cafe  Italian, American, Pizza, Mediterranean, Europ...
1                   Dogma  International, Mediterranean, Barbecue, Spanis...
2     Taberna El Callejon                   Mediterranean, European, Spanish
3                   Astor     International, Mediterranean, European, Fusion
4  La Gaditana Castellana   Spanish, Seafood, International, Diner, Wine Bar

Use

data.set_index('name')['Cuisine'].apply(lambda x: x.split(',')).apply(pd.Series).stack().reset_index().drop('level_1', axis=1)
data.columns = ['name', 'cusisine']

Output

 data.head()


               name        cusisine
0  Real Talent Cafe         Italian
1  Real Talent Cafe        American
2  Real Talent Cafe           Pizza
3  Real Talent Cafe   Mediterranean
4  Real Talent Cafe        European

Upvotes: 1

Related Questions