Reputation: 435
I have a csv file where there are columns Position and Quantity(Menge). I have to add copy of rows where(in my case it is Menge) Quantity >1 and I did it using :
file_to_check = file_to_check.loc[file_to_check.index.repeat(file_to_check.Menge)].reset_index(drop=True)
this works great, copied exactly as I wanted but I additionally have to update the Positions for them. For example :
Position Menge Product
200 3 a
200 3 a
200 3 a
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
400 7 b
200 4 c
200 4 c
200 4 c
200 4 c
I want it to look like this
Position Menge Product
200.1 3 a
200.2 3 a
200.3 3 a
400.1 7 b
400.2 7 b
400.3 7 b
400.4 7 b
400.5 7 b
400.6 7 b
400.7 7 b
200.1 4 c
200.2 4 c
200.3 4 c
200.4 4 c
.
.
.
Afterwards I know I can change the Menge(Quantity) by using :
selected.loc[:, 'Menge'] = 1
I tried using for loop and loc as:
counter=0
if selected.loc[selected.Menge >1]:
counter=selected['Menge']
i=1
while counter>=1:
selected['Pos.']+=i/10
i+=1
counter-=1
But I keep getting the error :
'ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().'
I searched for the answer but nothing is really helping me. Some help is needed since I am pretty new to python and pandas.
So I edited my question. I have different Products but with some of them have same Pos. . How can I change for each product the Pos. and not add as if all of them are one product.as shown in the table
Upvotes: 2
Views: 899
Reputation: 394389
You can do it like this:
In[75]:
df['Position'] = df['Position'] + df.groupby('Position')['Position'].rank(method='first')/10
df
Out[75]:
Position Menge
0 200.1 3
1 200.2 3
2 200.3 3
3 400.1 7
4 400.2 7
5 400.3 7
6 400.4 7
7 400.5 7
8 400.6 7
9 400.7 7
So here I groupby
on 'Position'
and call rank
with param method='first'
so that equal values are ranked in order of appearance, this effectively ranks the values in order which is the same as a counter
Your error comes from this:
counter=selected['Menge']
and then doing a comparison using:
while counter>=1:
So the error is telling you that the it doesn't understand how to interpret a Series
as doing counter >=
returns a boolean Series, it wants a scalar boolean value to interpret. You'd have to iterate row-wise so you get a scalar value in order to interpret correctly, besides, you should look to avoid loops where possible as it's slow
EDIT
Based on your new data, you just groupby
on multiple columns:
In[81]:
df['Position'] = df['Position'] + df.groupby(['Position','Menge'])['Position'].rank(method='first')/10
df
Out[81]:
Position Menge Product
0 200.1 3 a
1 200.2 3 a
2 200.3 3 a
3 400.1 7 b
4 400.2 7 b
5 400.3 7 b
6 400.4 7 b
7 400.5 7 b
8 400.6 7 b
9 400.7 7 b
10 200.1 4 c
11 200.2 4 c
12 200.3 4 c
EDIT
OK, to handle the situation where you have more than 10 occurrences so that it avoids doing 201
for instance:
In[98]:
df['Position'] = (df['Position'].astype(str) + '.' + df.groupby(['Position','Menge'])['Position'].rank(method='first').astype(int).astype(str)).astype(float)
df
Out[98]:
Position Menge Product
0 200.1 3 a
1 200.2 3 a
2 200.3 3 a
3 400.1 7 b
4 400.2 7 b
5 400.3 7 b
6 400.4 7 b
7 400.5 7 b
8 400.6 7 b
9 400.7 7 b
10 200.1 4 c
11 200.2 4 c
12 200.3 4 c
13 200.4 4 c
So this converts the output from rank
first to an int
as it produces a float
, then casts to a str
so we can just create a string e.g. '200' + '.' + '1'
and then cast back to a float
if necessary
Upvotes: 2
Reputation: 4638
import pandas as pd
df = pd.DataFrame({'Position':[200,200,200,400,400,400,400,400,400,400],'Menge':[3,3,3,7,7,7,7,7,7,7]})
for pos in df.Position.unique():
counter = 0.1
for idx,row in df.iterrows():
if row['Position'] == pos:
df.at[idx,'Position_1'] = df.at[idx,'Position']+counter
counter+=0.1
df.drop(['Position'],axis=1,inplace=True)
df.columns = ['Menge','Position']
Output:
print(df)
Menge Position
0 3 200.1
1 3 200.2
2 3 200.3
3 7 400.1
4 7 400.2
5 7 400.3
6 7 400.4
7 7 400.5
8 7 400.6
9 7 400.7
Upvotes: 0