Reputation: 903
This is my dataframe:
id Year Month Day Instant Temperature DayType DayValidity LoadNette
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
and i want to duplicate 5 times my dataframe but by giving wieghts for some rows based on Month like this for example the rows where month is 4 it will duplicate just 3 times and month 4 just 2 times likes this with python:
id Year Month Day Instant Temperature DayType DayValidity LoadNette
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
196 2008 5 5 4 8.03 6 1 50848.16566
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
195 2008 4 5 3 8.07 6 1 51907.94746
192 2008 1 5 0 8.03 6 1 53039.77133
193 2008 2 5 1 8.07 6 1 52200.71569
194 2008 3 5 2 8.10 6 1 51681.17260
there are any way to do it
Upvotes: 1
Views: 602
Reputation: 862591
You can use dict
for number of repeating with numpy.repeat
and dict comprehension:
d = {1:5, 2:2, 3:1, 4:3, 5:3}
l = df['Month'].map(d)
df = pd.DataFrame({col: np.repeat(df[col], l) for col in df.columns}, columns=df.columns)
print (df)
id Year Month Day Instant Temperature DayType DayValidity \
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
0 192 2008 1 5 0 8.03 6 1
1 193 2008 2 5 1 8.07 6 1
1 193 2008 2 5 1 8.07 6 1
2 194 2008 3 5 2 8.10 6 1
3 195 2008 4 5 3 8.07 6 1
3 195 2008 4 5 3 8.07 6 1
3 195 2008 4 5 3 8.07 6 1
4 196 2008 5 5 4 8.03 6 1
4 196 2008 5 5 4 8.03 6 1
4 196 2008 5 5 4 8.03 6 1
LoadNette
0 53039.77133
0 53039.77133
0 53039.77133
0 53039.77133
0 53039.77133
1 52200.71569
1 52200.71569
2 51681.17260
3 51907.94746
3 51907.94746
3 51907.94746
4 50848.16566
4 50848.16566
4 50848.16566
Another solution if need repeat all rows 5 times with concat
:
df = pd.concat([df] * 5, ignore_index=True)
print (df)
id Year Month Day Instant Temperature DayType DayValidity \
0 192 2008 1 5 0 8.03 6 1
1 193 2008 2 5 1 8.07 6 1
2 194 2008 3 5 2 8.10 6 1
3 195 2008 4 5 3 8.07 6 1
4 196 2008 5 5 4 8.03 6 1
5 192 2008 1 5 0 8.03 6 1
6 193 2008 2 5 1 8.07 6 1
7 194 2008 3 5 2 8.10 6 1
8 195 2008 4 5 3 8.07 6 1
9 196 2008 5 5 4 8.03 6 1
10 192 2008 1 5 0 8.03 6 1
11 193 2008 2 5 1 8.07 6 1
12 194 2008 3 5 2 8.10 6 1
13 195 2008 4 5 3 8.07 6 1
14 196 2008 5 5 4 8.03 6 1
15 192 2008 1 5 0 8.03 6 1
16 193 2008 2 5 1 8.07 6 1
17 194 2008 3 5 2 8.10 6 1
18 195 2008 4 5 3 8.07 6 1
19 196 2008 5 5 4 8.03 6 1
20 192 2008 1 5 0 8.03 6 1
21 193 2008 2 5 1 8.07 6 1
22 194 2008 3 5 2 8.10 6 1
23 195 2008 4 5 3 8.07 6 1
24 196 2008 5 5 4 8.03 6 1
LoadNette
0 53039.77133
1 52200.71569
2 51681.17260
3 51907.94746
4 50848.16566
5 53039.77133
6 52200.71569
7 51681.17260
8 51907.94746
9 50848.16566
10 53039.77133
11 52200.71569
12 51681.17260
13 51907.94746
14 50848.16566
15 53039.77133
16 52200.71569
17 51681.17260
18 51907.94746
19 50848.16566
20 53039.77133
21 52200.71569
22 51681.17260
23 51907.94746
24 50848.16566
Upvotes: 3
Reputation: 2944
used pandas.sample
function with weights. the syntax:
#vec = <vector of rows weights>
df.sample(weights = vec)
Upvotes: 1