Reputation: 43
Just a quick question.
I've a CSV, with a lot of columns. And I've 1 column named : Cuisine, with lot of values.
name,Cuisine
Real Talent Cafe,"Italian, American, Pizza, Mediterranean, European, Fusion"
Dogma,"International, Mediterranean, Barbecue, Spanish, Fusion"
Taberna El Callejon,"Mediterranean, European, Spanish"
Astor,"International, Mediterranean, European, Fusion"
La Gaditana Castellana,"Spanish, Seafood, International, Diner, Wine Bar"
I would like to make a new CSV from this CSV, with 2 columns : - Name - Cuisine (by splitting the 1st CSV)
This is the script I created, I select only the 2 columns how interest me : Name & Cuisine :
# -*- coding: utf-8 -*-
from itertools import chain
import numpy as np
import pandas as pd
df = pd.read_csv('res_madrid.csv', usecols=['name','Cuisine'])
items_count = df["Cuisine"].str.count(",") +1
pd.DataFrame({"name": np.repeat(df["name"], items_count),
"Cuisine": list(chain.from_iterable(df["Cuisine"].str.split(",")))})
and i get the following error :
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 471, in repeat
return _wrapfunc(a, 'repeat', repeats, axis=axis)
File "/usr/lib64/python3.6/site-packages/numpy/core/fromnumeric.py", line 56, in _wrapfunc
return getattr(obj, method)(*args, **kwds)
File "/usr/lib64/python3.6/site-packages/pandas/core/series.py", line 1157, in repeat
new_index = self.index.repeat(repeats)
File "/usr/lib64/python3.6/site-packages/pandas/core/indexes/base.py", line 862, in repeat
return self._shallow_copy(self._values.repeat(repeats))
ValueError: count < 0
Please note that if you do the test, copying the datas I shared you, it will work ... The issue appear when I load a CSV file containing more columns, and I use the "usecols" parameter.
And the expected result is the following :
name Cuisine
0 Real Talent Cafe Italian
0 Real Talent Cafe American
0 Real Talent Cafe Pizza
0 Real Talent Cafe Mediterranean
0 Real Talent Cafe European
0 Real Talent Cafe Fusion
1 Dogma International
1 Dogma Mediterranean
1 Dogma Barbecue
1 Dogma Spanish
1 Dogma Fusion
2 Taberna El Callejon Mediterranean
2 Taberna El Callejon European
2 Taberna El Callejon Spanish
3 Astor International
3 Astor Mediterranean
3 Astor European
3 Astor Fusion
4 La Gaditana Castellana Spanish
4 La Gaditana Castellana Seafood
4 La Gaditana Castellana International
4 La Gaditana Castellana Diner
4 La Gaditana Castellana Wine Bar
Edit : The error come because I've null values en column Cuisine. How I can avoid that ?
Thanks for your help :) Regards Alexandre
Upvotes: 1
Views: 114
Reputation: 1286
If you want a solution without apply
and list comprehesion you can do this:
pd.DataFrame(df.Cuisine.str.split(',').values.tolist(), index=df.Name)\
.stack().reset_index().drop('level_1', axis=1)
Upvotes: 0
Reputation: 36
How about this
pd.concat([Series(row['name'], row['Cuisine'].split(','))
for index, row in df.iterrows()]).reset_index()
Then you just have to rename the columns
Upvotes: 1
Reputation: 3770
data = pd.read_csv(#path to txt file)
data
name Cuisine
0 Real Talent Cafe Italian, American, Pizza, Mediterranean, Europ...
1 Dogma International, Mediterranean, Barbecue, Spanis...
2 Taberna El Callejon Mediterranean, European, Spanish
3 Astor International, Mediterranean, European, Fusion
4 La Gaditana Castellana Spanish, Seafood, International, Diner, Wine Bar
Use
data.set_index('name')['Cuisine'].apply(lambda x: x.split(',')).apply(pd.Series).stack().reset_index().drop('level_1', axis=1)
data.columns = ['name', 'cusisine']
Output
data.head()
name cusisine
0 Real Talent Cafe Italian
1 Real Talent Cafe American
2 Real Talent Cafe Pizza
3 Real Talent Cafe Mediterranean
4 Real Talent Cafe European
Upvotes: 1