Andrei
Andrei

Reputation: 29

Is there a way to save value labels for Stata categorical data within Python?

So I know it's possible to read in either Stata categorical labels or values using the convert_categoricals parameter.

I was looking for a way to write/export a pandas dataframe to Stata and include the value labels. However all I could find was either

data_label : str, optional for the dataset label

or

variable_labels : dict for column names label,

but nothing for the values themselves.

Upvotes: 2

Views: 1378

Answers (3)

Farkhad
Farkhad

Reputation: 11

As of 2023 April, pandas allows you provide "value_labels" in pd.DataFrame.to_stata(). If you look at the code of "to_stata" method and you can find the description for adding variable labels, data label as well as value labels: Here is a piece from that description:

....

value_labels : dict of dicts

Dictionary containing columns as keys and dictionaries of column value to labels as values. Labels for a single variable must be 32,000 characters or smaller.

....

Example: If for column "animals" that can take two values [1,2] you want to set labels ['Cat', 'Dog] the in pd.DataFrame.to_stata() you provide:

value_labels = {'animals': {1: 'Cat', 2: 'Dog'}}

Upvotes: 1

Wouter
Wouter

Reputation: 3261

The pandas equivalent to a Stata variable with numerically encoded string values is the Categorical dtype. Exporting a Categorical column with the to_stata method will export it as such. Taking the example of Álvaro A. Gutiérrez Vargas:

data = [['Eren Jaeger', 15, 1, 'Soldier' ] , ['Mikasa Ackerman', 14, 1, 'Soldier'], ['Armin Arlert', 14, 1 , 'Soldier'],['Levi Ackerman', 30, 2, 'Captain']]
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank_num', 'Rank'])
df['Rank'] = df['Rank'].astype('category')
df.to_stata('YOUR/PATH/HERE', write_index=False)

This will create a Stata dataset with a Rank variable encoded as 0=Captain, 1=Soldier. One could change the order by using Categorical.reorder_categories() or Categorical.set_categories(), for example:

df['Rank'] = df['Rank'].cat.reorder_categories(['Soldier', 'Captain'], ordered=True)

Now, exporting with the to_stata method will use encoding 0=Soldier, 1=Captain.

There is no way to specify a custom encoding though, so if you need something more specific than a 0 to max encoding, you should go with the method of Álvaro A. Gutiérrez Vargas.

Upvotes: 1

Here is an answer to your question. It is probably not what you were expecting because I am not using pd.to_Stata, but the Python integration developed on Stata 16.

The code below must be executed within Stata (from version 16 onwards). Briefly, I am generating a Pandas Data.Frame (df) that I will export to Stata. The trick is to apply the labels on the values using the ValueLabel.setLabelValue() functionality that comes from the sfi library.

clear all

python:
from sfi import ValueLabel, Data
import pandas as pd

data = [['Eren Jaeger', 15, 1, 'Soldier' ] , ['Mikasa Ackerman', 14, 1, 'Soldier'], ['Armin Arlert', 14, 1 , 'Soldier'],['Levi Ackerman', 30, 2, 'Captain']]  
#creating DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'Rank_num', 'Rank'])

##              Name  Age  Rank_num     Rank
##0      Eren Jaeger   15         1  Soldier
##1  Mikasa Ackerman   14         1  Soldier
##2     Armin Arlert   14         1  Soldier
##3    Levi Ackerman   30         2  Captain


# Set number of observations in Stata
Data.setObsTotal(len(df))

#Create variables on Stata (from Python)
Data.addVarStr("Name",10)
Data.addVarDouble("Age")
Data.addVarInt("Rank_num")

#Store the content of "df" object from Python to Stata
Data.store("Name", None, df['Name'], None)
Data.store("Age", None, df['Age'], None)
Data.store("Rank_num", None, df['Rank_num'], None)

# HERE is where I solve your question!
# 1) Create the labels
ValueLabel.setLabelValue('rank_num_LABEL', 1, 'Soldier')
ValueLabel.setLabelValue('rank_num_LABEL', 2, 'Captain')
ValueLabel.getValueLabels('rank_num_LABEL')

# 2) Attach the labels to the created variable
#Attach the created label 
ValueLabel.setVarValueLabel('Rank_num', 'rank_num_LABEL')

end 

br

* At the end, you will see the following on the Stata browser
* Name              Age Rank_num
* Eren Jaeger       15  Soldier
* Mikasa Ackerman   14  Soldier
* Armin Arlert      14  Soldier
* Levi Ackerman     30  Captain

In case you want to understand better the reasoning behind the code above, here are the references that I used to learn it.

  1. Stata/Python integration part 9: Using the Stata Function Interface to copy data from Python to Stata
  2. Stata/Python integration part 8: Using the Stata Function Interface to copy data from Stata to Python

Upvotes: 5

Related Questions