Reputation: 43
So I have a dataframe with two columns; Variable name and the time instance in which that variable appears as follows:
Variable Time
v1 t1
v2 t2
v3 t3
I would like to add another column with a value, this column will have a boolean value (1 or 0), 1 means the value has appeared (that instance of time) and zero outside that time instance, (t+1). Something like this:
Variable Time Value
v1 t1 1
v1 t1+1 0
v2 t2 1
v2 t2+1 0
v3 t3 1
v3 t3+1 0
Any ideas on how I would achieve this in python dataframe?
Cheers.
Upvotes: 1
Views: 621
Reputation: 862721
If variable column is sorted use duplicated
for mask, invert it by ~
and cast to int
- True
s are 1
s and False
s are 0
s:
print (df)
Variable Time
0 v1 3
1 v1 4
2 v2 7
3 v2 8
4 v3 3
5 v3 4
6 v3 5
df['Value'] = (~df['Variable'].duplicated()).astype(int)
print (df)
Variable Time Value
0 v1 3 1
1 v1 4 0
2 v2 7 1
3 v2 8 0
4 v3 3 1
5 v3 4 0
6 v3 5 0
Upvotes: 1
Reputation: 27869
If you are using pandas
this will do what you asked for:
import pandas as pd
df = pd.DataFrame({'Variable': ['v1', 'v2', 'v3'], 'Time': ['t1', 't2', 't3']})
df['Value'] = 1
newTime = df.Time.apply(lambda x: x + '+1')
df2 = df.copy()
df2.Time = newTime
df2.Value = 0
df = df.append(df2).sort_values('Variable').reset_index(drop=True)
df = df[['Variable', 'Time' , 'Value']]
Upvotes: 0