Vityata
Vityata

Reputation: 43595

Adding column with boolean values, based on a month in pandas data frame

I am trying to get a 1 for the row 2020-01, only when the row crosses the column "Jan". And the same for every month:

Thus, all together, once per row, there should be a single 1 and a few 0, based on the months. This is what I tried and the result is the screenshot, without the blue edit.

columns = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov"]

for i in range(len(columns)):
    df[columns[i]] = df.TIME.astype(str).str[5] + df.TIME.astype(str).str[6]
df

The blue edit is the target.

enter image description here

I have tried a ternary operators like this:

for i in range(len(columns)):
    df[columns[i]] = 1 if (df.TIME.astype(str).str[5] + df.TIME.astype(str).str[6] == "01") else 0

and the error is:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 1

Views: 269

Answers (1)

Chris Adams
Chris Adams

Reputation: 18647

You can use datetime.dt.stftime with %b formatter, get_dummies, reindex and join back onto original DataFrame:

# Example setup
columns = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov"]

df = pd.DataFrame({'TIME': ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09']})    

df.join(pd.to_datetime(df['TIME']).dt.strftime('%b')
        .str.get_dummies()
        .reindex(columns=columns, fill_value=0))

[out]

      TIME  Jan  Feb  Mar  Apr  May  Jun  Jul  Aug  Sep  Oct  Nov
0  2020-01    1    0    0    0    0    0    0    0    0    0    0
1  2019-12    0    0    0    0    0    0    0    0    0    0    0
2  2019-11    0    0    0    0    0    0    0    0    0    0    1
3  2019-10    0    0    0    0    0    0    0    0    0    1    0
4  2019-09    0    0    0    0    0    0    0    0    1    0    0

Edit

I've only added this since you requested it specifically... Here is an example of how you could loop through a dataframe and columns to update values - again I'll just reiterate that this is not what I'd recommend personally and is very inefficient comapred with the above:

import datetime as dt

columns = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov"]

df = pd.DataFrame({'TIME': ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09']})

for c in columns:
    for i, t in df['TIME'].iteritems():
        if dt.datetime.strptime(t, '%Y-%m').strftime('%b') == c:
            df.loc[i, c] = 1
        else:
            df.loc[i, c] = 0

Upvotes: 2

Related Questions