Reputation: 471
I think my code is close, but really screwed up. Basically, I have rows of sales data. One of the variables is a receipt number, while another is a string containing Units and Products as a string.
I'd like a new row for each unique product, retaining the relevant receipt number for that row. For instance, an entry for Receipt A with products L and Q should become two separate entries: Receipt A with L, and Receipt A with Q. So I'm trying to split the string into variables and also create a new entry under the same receipt identifier.
Anyway, but the time I'm done I've accidentally duplicated some of the entries. Any help with a) making it generate the right output and b) prettier/simpler code? (I also don't need the old variable Description).
Many thanks
import pandas as pd
import numpy as np
df = pd.DataFrame({"Date": ["9/26/17 2:33 PM", "9/26/17 2:23 PM", "9/26/17 2:22 PM"], "Receipt number": ["1-1002","1-1001","1-1000"], "Description": ["1 x Capacino, 2 x Americano","1 x Americano","1 x Latte"]})
df
df2 = df['Description'].str.split(',').apply(pd.Series, 1).stack()
df2.index = df2.index.droplevel(-1)
df2.name = 'Product'
df = df.join(df2)
df.join(df['Product'].str.split(' x ', 1, expand=True).rename(columns={0:'Units', 1:'Product Name'}))
Upvotes: 0
Views: 83
Reputation: 10429
Well, you are close. Except for the last line. Here's what I did, following along with your code:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Date": ["9/26/17 2:33 PM", "9/26/17 2:23 PM", "9/26/17 2:22 PM"], "Receipt number": ["1-1002","1-1001","1-1000"], "Description": ["1 x Capacino, 2 x Americano","1 x Americano","1 x Latte"]})
df2 = df['Description'].str.split(',').apply(pd.Series, 1).stack()
df2.index = df2.index.droplevel(-1)
df2.name = 'Product'
df = df.join(df2)
# Here is where I diverge
df['Units'] = df['Product'].apply(lambda x: int(x.split(' x ')[0]))
df['Product'] = df['Product'].apply(lambda x: x.split(' x ')[-1])
df = df.drop('Description', axis=1)
Result
Date Receipt number Product Units
0 9/26/17 2:33 PM 1-1002 Capacino 1
0 9/26/17 2:33 PM 1-1002 Americano 2
1 9/26/17 2:23 PM 1-1001 Americano 1
2 9/26/17 2:22 PM 1-1000 Latte 1
Upvotes: 1