Amal Kostali Targhi
Amal Kostali Targhi

Reputation: 903

give weight on some rows based on columns

This is my dataframe:

id   Year   Month   Day     Instant     Temperature     DayType     DayValidity     LoadNette   
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     
195     2008    4   5   3   8.07    6   1   51907.94746     
196     2008    5   5   4   8.03    6   1   50848.16566     

and i want to duplicate 5 times my dataframe but by giving wieghts for some rows based on Month like this for example the rows where month is 4 it will duplicate just 3 times and month 4 just 2 times likes this with python:

id   Year   Month   Day     Instant     Temperature     DayType     DayValidity     LoadNette   
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     
195     2008    4   5   3   8.07    6   1   51907.94746     
196     2008    5   5   4   8.03    6   1   50848.16566
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     
195     2008    4   5   3   8.07    6   1   51907.94746     
196     2008    5   5   4   8.03    6   1   50848.16566
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     
195     2008    4   5   3   8.07    6   1   51907.94746     
196     2008    5   5   4   8.03    6   1   50848.16566
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     
195     2008    4   5   3   8.07    6   1   51907.94746     
192     2008    1   5   0   8.03    6   1   53039.77133     
193     2008    2   5   1   8.07    6   1   52200.71569     
194     2008    3   5   2   8.10    6   1   51681.17260     

there are any way to do it

Upvotes: 1

Views: 602

Answers (2)

jezrael
jezrael

Reputation: 862591

You can use dict for number of repeating with numpy.repeat and dict comprehension:

d = {1:5, 2:2, 3:1, 4:3, 5:3}
l = df['Month'].map(d)
df = pd.DataFrame({col: np.repeat(df[col], l) for col in df.columns}, columns=df.columns)

print (df)
    id  Year  Month  Day  Instant  Temperature  DayType  DayValidity  \
0  192  2008      1    5        0         8.03        6            1   
0  192  2008      1    5        0         8.03        6            1   
0  192  2008      1    5        0         8.03        6            1   
0  192  2008      1    5        0         8.03        6            1   
0  192  2008      1    5        0         8.03        6            1   
1  193  2008      2    5        1         8.07        6            1   
1  193  2008      2    5        1         8.07        6            1   
2  194  2008      3    5        2         8.10        6            1   
3  195  2008      4    5        3         8.07        6            1   
3  195  2008      4    5        3         8.07        6            1   
3  195  2008      4    5        3         8.07        6            1   
4  196  2008      5    5        4         8.03        6            1   
4  196  2008      5    5        4         8.03        6            1   
4  196  2008      5    5        4         8.03        6            1   

     LoadNette  
0  53039.77133  
0  53039.77133  
0  53039.77133  
0  53039.77133  
0  53039.77133  
1  52200.71569  
1  52200.71569  
2  51681.17260  
3  51907.94746  
3  51907.94746  
3  51907.94746  
4  50848.16566  
4  50848.16566  
4  50848.16566  

Another solution if need repeat all rows 5 times with concat:

df = pd.concat([df] * 5, ignore_index=True)

print (df)
     id  Year  Month  Day  Instant  Temperature  DayType  DayValidity  \
0   192  2008      1    5        0         8.03        6            1   
1   193  2008      2    5        1         8.07        6            1   
2   194  2008      3    5        2         8.10        6            1   
3   195  2008      4    5        3         8.07        6            1   
4   196  2008      5    5        4         8.03        6            1   
5   192  2008      1    5        0         8.03        6            1   
6   193  2008      2    5        1         8.07        6            1   
7   194  2008      3    5        2         8.10        6            1   
8   195  2008      4    5        3         8.07        6            1   
9   196  2008      5    5        4         8.03        6            1   
10  192  2008      1    5        0         8.03        6            1   
11  193  2008      2    5        1         8.07        6            1   
12  194  2008      3    5        2         8.10        6            1   
13  195  2008      4    5        3         8.07        6            1   
14  196  2008      5    5        4         8.03        6            1   
15  192  2008      1    5        0         8.03        6            1   
16  193  2008      2    5        1         8.07        6            1   
17  194  2008      3    5        2         8.10        6            1   
18  195  2008      4    5        3         8.07        6            1   
19  196  2008      5    5        4         8.03        6            1   
20  192  2008      1    5        0         8.03        6            1   
21  193  2008      2    5        1         8.07        6            1   
22  194  2008      3    5        2         8.10        6            1   
23  195  2008      4    5        3         8.07        6            1   
24  196  2008      5    5        4         8.03        6            1   

      LoadNette  
0   53039.77133  
1   52200.71569  
2   51681.17260  
3   51907.94746  
4   50848.16566  
5   53039.77133  
6   52200.71569  
7   51681.17260  
8   51907.94746  
9   50848.16566  
10  53039.77133  
11  52200.71569  
12  51681.17260  
13  51907.94746  
14  50848.16566  
15  53039.77133  
16  52200.71569  
17  51681.17260  
18  51907.94746  
19  50848.16566  
20  53039.77133  
21  52200.71569  
22  51681.17260  
23  51907.94746  
24  50848.16566  

Upvotes: 3

Dimgold
Dimgold

Reputation: 2944

used pandas.sample function with weights. the syntax:

#vec = <vector of rows weights>
df.sample(weights = vec)

Upvotes: 1

Related Questions