Reputation: 2620
I have a tab separated file like
text_id text task_1 task_2 task_3
hasoc_en_1 in the know as nation's pride is involved lorem NOT NONE NONE
hasoc_en_2 admitted to treason . #TrumpIsATraitor #McCainsAHero #JohnMcCainDay HOF HATE TIN
I can read it into a dataframe like
df=pd.read_csv(r"c:\Users\asd\Desktop\dd\english_dataset\english_dataset.tsv", sep='\t', header=0)
I want to have all unique values in task_1, task_2 and task_3 as column headers and 1 or 0 as row value, for example:
text_id text NOT HOF NONE HATE TIN
hasoc_en_1 in the know as nation's pride is involved lorem 1 0 1 0 0
hasoc_en_2 admitted to treason . #TrumpIsATraitor #McCainsAHero #JohnMcCainDay 0 1 0 1 0
is there any built in function or an easy way to do it? or do i have to loop through one dataframe and insert values in other one. suggestions please?
Upvotes: 0
Views: 30
Reputation: 976
You can use pandas.get_dummies() https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.get_dummies.html for each separate task column and then take max to convert them into form you request/
Upvotes: 1