Johnny
Johnny

Reputation: 43

Adding a boolean value column to a dataframe containing variable and time columns

So I have a dataframe with two columns; Variable name and the time instance in which that variable appears as follows:

Variable  Time
v1        t1
v2        t2
v3        t3

I would like to add another column with a value, this column will have a boolean value (1 or 0), 1 means the value has appeared (that instance of time) and zero outside that time instance, (t+1). Something like this:

Variable  Time   Value
v1        t1     1
v1        t1+1   0
v2        t2     1
v2        t2+1   0
v3        t3     1
v3        t3+1   0

Any ideas on how I would achieve this in python dataframe?
Cheers.

Upvotes: 1

Views: 621

Answers (2)

jezrael
jezrael

Reputation: 862721

If variable column is sorted use duplicated for mask, invert it by ~ and cast to int - Trues are 1s and Falses are 0s:

print (df)
  Variable  Time
0       v1     3
1       v1     4
2       v2     7
3       v2     8
4       v3     3
5       v3     4
6       v3     5

df['Value'] = (~df['Variable'].duplicated()).astype(int)
print (df)
  Variable  Time  Value
0       v1     3      1
1       v1     4      0
2       v2     7      1
3       v2     8      0
4       v3     3      1
5       v3     4      0
6       v3     5      0

Upvotes: 1

zipa
zipa

Reputation: 27869

If you are using pandas this will do what you asked for:

import pandas as pd

df = pd.DataFrame({'Variable': ['v1', 'v2', 'v3'], 'Time': ['t1', 't2', 't3']})
df['Value'] = 1

newTime = df.Time.apply(lambda x: x + '+1')
df2 = df.copy()
df2.Time = newTime
df2.Value = 0

df = df.append(df2).sort_values('Variable').reset_index(drop=True)

df = df[['Variable', 'Time' , 'Value']]

Upvotes: 0

Related Questions