Reputation: 49
I have a dataframe like this:
| | Vowel | Number |
|---:|:--------|---------:|
| 0 | a | 2 |
| 1 | b | 3 |
| 2 | c | 4 |
| 3 | a | 4 |
| 4 | a | 8 |
| 5 | b | 2 |
| 6 | c | 5 |
| 7 | c | 9 |
I want to create a column with the diff values based on the column Vowel and Number. I want this output:
| | Vowel | Number | Diff |
|---:|:--------|---------:|-------:|
| 0 | a | 2 | nan |
| 1 | b | 3 | nan |
| 2 | c | 4 | nan |
| 3 | a | 4 | 2 |
| 4 | a | 8 | 4 |
| 5 | b | 2 | -1 |
| 6 | c | 5 | 1 |
| 7 | c | 9 | 4 |
So, looking for the value 'a' in Vowel Column, the first 'a' get the value nan because there is no values on column 'Number' before. The second 'a' gets the value 2 because 4 - 2 = 2. (Number Column).
I'm doing something like this:
for i in list(set(df['Vowel'])):
one_vowel = df[df['Vowel'] == i]
for n in one_vowel['Number'].diff():
print(f'{i} and {n}')
result:
b and nan
b and -1.0
a and nan
a and 2.0
a and 4.0
c and nan
c and 1.0
c and 4.0
but I want to get the right order according to the column.
please, somebody help me?
Upvotes: 0
Views: 26
Reputation: 8302
try this,
df['Diff'] = df.groupby('Vowel')['Number'].diff()
output,
0 NaN
1 NaN
2 NaN
3 2.0
4 4.0
5 -1.0
6 1.0
7 4.0
Name: Diff, dtype: float64
Upvotes: 1