Anna Nikman
Anna Nikman

Reputation: 23

Transform string variable into 0-1 columns

As a very begginer in SPSS I would ask you for help with some transformation from table A into table B. I have to recode values of "brand" variable into columns and make 0-1 variables.

#table A#
nr  brand
1   GREEN CARE PROFESSIONAL
1   GREEN CARE PROFESSIONAL
1   GREEN CARE PROFESSIONAL
2   HENKEL
3   HENKEL
3   HENKEL
3   HENKEL
3   VIZIR
4   BIEDRONKA
4   BOBINI
4   BOBINI
4   BOBINI
4   BOBINI
4   BOBINI
4   HENKEL
5   VIZIR
6   HENKEL

#table B#
nr GREEN HENKEL VIZIR BIEDR BOBINI
1   1   0   0   0   0
1   1   0   0   0   0
1   1   1   0   0   0
2   0   1   0   0   0
3   0   1   0   0   0
3   0   1   0   0   0
3   0   1   0   0   0
3   0   0   1   0   0
4   0   0   0   1   0
4   0   0   0   0   1
4   0   0   0   0   1
4   0   0   0   0   1
4   0   0   0   0   1
4   0   0   0   0   1
4   0   1   0   0   0
5   0   0   1   0   0
6   0   1   0   0   0

I can do it in this particular case in this simple way:

compute HENKEL=0.
...
do if BRAND='GREEN_CARE' .
compute GREEN_CARE=1.
else if ....

but the loop has to be usable with another variable and different number of values ect. I was trying to make it all day and gave up.

Do you have any idea to make it in a easy way? Thanks!

Upvotes: 2

Views: 129

Answers (1)

eli-k
eli-k

Reputation: 11310

The following syntax does the job on the sample data you provided. First, let's recreate the sample data to demonstrate on:

Data list list/nr (f1) brand (a30).
begin data
1   "GREEN CARE PROFESSIONAL"
1   "GREEN CARE PROFESSIONAL"
1   "GREEN CARE PROFESSIONAL"
2  "HENKEL"
3   "HENKEL"
3   "HENKEL"
3   "HENKEL"
3   "VIZIR"
4   "BIEDRONKA"
4   "BOBINI"
4   "BOBINI"
4  "BOBINI"
4   "BOBINI"
4   "BOBINI"
4   "HENKEL"
5   "VIZIR"
6   "HENKEL"
end data.
dataset name originalDataset.

Now for the restructure.

sort cases by nr brand.
* creating an index to enumerate cases for each combination of `nr` and `brand`.
* This is necessary for the `casestovars` command to work later.
compute ind=1.
if $casenum>1 and lag(nr)=nr and lag(brand)=brand ind=lag(ind)+1.
exe.
* variable names can't have spaces in them, so changing the category names accordingly.
compute brand=replace(rtrim(brand)," ","_").
sort cases by nr ind brand.
compute exist=1.
casestovars  /id=nr ind /index= brand/autofix=no.

Upvotes: 1

Related Questions