Reputation: 1
First, forgive my english please. I'm trying to anonymise a xml file in bash, using xmllint and a random value. It works with different xpath and wrinting (like //Prestation/Libelle/@V or //IdVer[text()]), and when there is only 1 appearance of xpath everything is fine and the value is changed.
But since there is at least 2 appearances of the same xpath, he's lost, and he even leaves the loop i guess because the echo only send back 1 value instead of multiple value for the same xpath at different line in the xml file.
Can xmllint deal with more than 1 same xpath ? Should i use another langage ?
There is the script (it was in french first, so i did my best for you to understand some french sentence, i hope everything would be fine) :
#!/bin/bash
#Dir with xml and xhl files
dir="/home/batch_spl_ia/jeremy"
while true; do
# ask for xpath
read -p "Enter the xpath (ex: /root/element/@attribut or //Prestation/Libelle/@V or //IdVer[text()]) : " xpath
# ask how many character for the anonymous value
read -p "How many character do you want ? : " nb_caracteres
# ask what type of character
read -p "Do you want alpha (a), digital (n) or alphanumeric (an)? (a/n/an) : " choice
# Generate string of characters
if [ "$choice" == "a" ]; then
replacement=$(openssl rand -base64 "$nb_caracteres" | tr -dc 'a-zA-Z' | head -c "$nb_caracteres")
elif [ "$choice" == "n" ]; then
replacement=$(cat /dev/urandom | tr -dc '0-9' | head -c "$nb_caracteres")
else
replacement=$(openssl rand -base64 "$nb_caracteres" | tr -dc 'a-zA-Z0-9' | head -c "$nb_caracteres")
fi
# Variable to check if a value has been replaced
value_replaced=false
# read files in directory
for file in "$dir"/*.xml "$dir"/*.xhl; do
if [ -e "$file" ]; then
# Extract every value for the xpath in file
orig_value=$(xmllint --xpath "string($xpath)" "$file")
echo "$orig_value"
# Check if value has been found for the xpath
if [ -n "$orig_value" ]; then
# Replace every appearance of value with the replacement value
xmllint --shell "$file" << EOF > /dev/null 2>&1
cd $xpath
set $replacement
save
EOF
value_replaced=true
echo "value after anonymization : $replacement"
echo "file anonymized : $file"
fi
fi
done
# Check if value replaced
if [ "$value_replaced" = false ]; then
echo "Error : no value found for the xpath specified : no replacement"
fi
echo "-----------------------"
# Ask user if he wants to anonymize another xpath
read -p "Do you want to anonymize another xpath ?(y/n) : " continuer
if [ "$continuer" != "y" ]; then
break
fi
done
When i try those 2 xpath the value is changed because there is only 1 xpath in file : //Prestation/Libelle/@V //DonneesIndiv/PayeIndivMensuel/Periode/DateDebut/@V
And when i use this xpath the 2 values of the 2 xpath are not changed and the echo only return 1 value instead of 2: //DonneesIndiv/PayeIndivMensuel/Remuneration/Indemnite/Libelle/@V
Thank you for your understanding and help
Upvotes: -1
Views: 130
Reputation: 12662
Find all xpath expression using whereis
xmllint shell command and change them with another value. Having an array of expressions helps to change all values and save once instead of opening/saving the file for each expression
file='tmp.xml'
xpath="//parent/paragraph/@newid"
read -a expr_arr < <(printf "%s\n" "whereis $xpath" "bye" | xmllint --shell tmp.xml | grep -v '^[/] >' | tr '\n' ' ')
replacement="xxxyyyzzz"
declare -a arr
# Iterate over found xpath expressions
for expr in "${expr_arr[@]}"; do
arr[${#arr[*]}]="cd ${expr}"
arr[${#arr[*]}]="set $replacement"
echo "$expr value after anonymization : $replacement"
done
arr[${#arr[*]}]="save"
arr[${#arr[*]}]="bye"
echo "file anonymized : $file"
printf '%s\n' "${arr[@]}" | xmllint --shell "$file"
xmllint --xpath "$xpath" "$file"
Given this sample
<?xml version="1.0"?>
<root>
<parent>
<paragraph newid="13">Any text</paragraph>
</parent>
<parent>
<paragraph newid="43">Any text</paragraph>
<paragraph newid="as77">Any text</paragraph>
</parent>
</root>
Ouput showing the used expressions
/root/parent[1]/paragraph/@newid value after anonymization : xxxyyyzzz
/root/parent[2]/paragraph[1]/@newid value after anonymization : xxxyyyzzz
/root/parent[2]/paragraph[2]/@newid value after anonymization : xxxyyyzzz
file anonymized : tmp.xml
/ > cd /root/parent[1]/paragraph/@newid
newid > set xxxyyyzzz
newid > cd /root/parent[2]/paragraph[1]/@newid
newid > set xxxyyyzzz
newid > cd /root/parent[2]/paragraph[2]/@newid
newid > set xxxyyyzzz
newid > save
newid > bye
newid="xxxyyyzzz"
newid="xxxyyyzzz"
newid="xxxyyyzzz"
Upvotes: 0