Reputation: 193
I have data with very odd delimiters:
1,|ABC1|,|BUD|,|Fed Budget & Appropriations|,|t1|
2,|ABC2|,|LBR|,|Labor, Antitrust & Workplace|,|t2|
3,|ABC3|,|UNM|,|Unemployment|,|t1|
So the delimiter is a comma and each variable, but the first one (the identifier) is between two pipes. The problem is that the fourth variable also uses commas, so I can't simply use commas as delimiters and delete the pipes. I have found a way to work the data by doing some find and replace operations through the terminal, but I would like to do this through Stata. Does anyone have an idea how to?
Upvotes: 1
Views: 269
Reputation: 37208
I put your data example into a text file and found that the delimiters were detected quite well automatically. Then I drop
ped any variable that was all commas or all missing, using findname
from the Stata Journal.
. import delimited "troublesome.txt"
(9 vars, 3 obs)
. list
+-------------------------------------------------------------------------+
| v1 v2 v3 v4 v5 v6 v7 v8 v9 |
|-------------------------------------------------------------------------|
1. | 1, ABC1 , BUD , Fed Budget & Appropriations , t1 . |
2. | 2, ABC2 , LBR , Labor, Antitrust & Workplace , t2 . |
3. | 3, ABC3 , UNM , Unemployment , t1 . |
+-------------------------------------------------------------------------+
. findname, all(@ == ",")
v3 v5 v7
. drop `r(varlist)'
. findname, all(missing(@))
v9
. drop `r(varlist)'
. destring v1, ignore(",") replace
v1: character , removed; replaced as byte
. list
+-----------------------------------------------------+
| v1 v2 v4 v6 v8 |
|-----------------------------------------------------|
1. | 1 ABC1 BUD Fed Budget & Appropriations t1 |
2. | 2 ABC2 LBR Labor, Antitrust & Workplace t2 |
3. | 3 ABC3 UNM Unemployment t1 |
+-----------------------------------------------------+
Upvotes: 2