Beate
Beate

Reputation: 23

Strip prefix from all variable names in SPSS

I have a similar question as asked here (Strip suffix from all variable names in SPSS) and the answers there already helped a lot but there is still one question remaining.

I have a dataset in which every variable name has the prefix "v23_1_". I want to remove this prefix from all variables, but there are hundreds of them, so I am looking for a way to do it without using the RENAME statement hundreds of times.

I used this code:

begin program.
vdict=spssaux.VariableDict()
mylist=vdict.range(start="v23_1_dg_mnpdocid", end="v23_1_phq9t0_asku3t0")
nvars = len(mylist)

for i in range(nvars):
    myvar = mylist[i]
    mynewvar = myvar.strip("v23_1_")
    spss.Submit(r"""
        rename variables ( %s = %s) .
                        """ %(myvar, mynewvar))
end program.

Here is a list of the first few variables:

v23_1_dg_mnppusid
v23_1_dg_sigstatus
v23_1_dg_mnpvsno
v23_1_dg_mnpvslbl
v23_1_dg_mnpcvpid
v23_1_dg_mnpvisid
v23_1_dg_mnpvisno
v23_1_dg_mnpvispdt
v23_1_dg_mnpvisfdt
v23_1_dg_mnpfs0
v23_1_dg_mnpfs1
v23_1_dg_mnpfs2
v23_1_dg_mnpfs3
v23_1_dg_mnpfcs0
v23_1_dg_mnpfcs1
v23_1_dg_mnpfcs2

It worked ok for the first variables but then stopped with the message "renaming has created two variables named dg_mnpfs". But the next variable would after stripping has the name "dg_mnpfs2". What has happened is that the 1 at the end in "v23_1_dg_mnpfs1" gets deleted too. And then it propbably intends to also delete the 2 at the end in "v23_1_dg_mnpfs2", which will then lead to the same variable. I don't understand why this is happening and how I can avoid it.

Thanks a lot for your support! Kind regards, Beate

Upvotes: 2

Views: 1284

Answers (2)

eli-k
eli-k

Reputation: 11350

Here's a version of the process using SPSS macro. Using SPSSINC SELECT VARIABLES lets you get the whole list of all relevant variables, whatever order they are in, without naming them in the command:

*this is just to create a sample data to play with.
data list list/v23_1_var1 to v23_1_var6.
begin data
end data.

The following creates a list of the relevant variables:

SPSSINC SELECT VARIABLES MACRONAME="!list" /PROPERTIES  PATTERN = "v23_1_*".
* the following macro creates one rename command for all the list. 
define !doRename ()
rename variables (!eval(!list)=!do !i !in(!eval(!list)) !substr(!i, 7) !doend).
!enddefine.
!doRename .

Upvotes: 1

horace_vr
horace_vr

Reputation: 3166

As you syntax looks right now, it will run on a variable-by-variable basis. You are submitting/running the RENAME VARIABLES command as many times as the number of variables in your list. On one hand, this is in-efficient, as it takes longer to run than what I am suggesting below. On the other (and more important) hand, doing it variable by variable, does not guard against duplicate variables. I am guessing that you already have in your datafile a variable named dg_mnpfs, and you are attempting to create a new one by renaming v23_1_dg_mnpfs. Just check your datafile, after your python code breaks.

A more efficient way of writing you code would be to create lists with the old names, and new names, and submit the syntax with only one command.

begin program.
import spss,spssaux
vdict=spssaux.VariableDict()
mylist=vdict.range(start="v23_1_dg_mnpdocid", end="v23_1_phq9t0_asku3t0")
nvars = len(mylist)

my_new_list=[]
for i in range(nvars):
    myvar = mylist[i]
    mynewvar = myvar.strip("v23_1_")
    my_new_list.append(mynewvar)

my_syntax="ren var (" + " ".join(mylist) + "=" + " ".join(my_new_list) +")."
spss.Submit(my_syntax)
end program.

And one more thing: the strip function removes the text from both ends of the variables. If you only want to remove the prefix, consider using lstrip. Details can be found here, in the official documentation.

Upvotes: 2

Related Questions