Jhilumine
Jhilumine

Reputation: 1

anonymise xml file in shell with xmllint

First, forgive my english please. I'm trying to anonymise a xml file in bash, using xmllint and a random value. It works with different xpath and wrinting (like //Prestation/Libelle/@V or //IdVer[text()]), and when there is only 1 appearance of xpath everything is fine and the value is changed.

But since there is at least 2 appearances of the same xpath, he's lost, and he even leaves the loop i guess because the echo only send back 1 value instead of multiple value for the same xpath at different line in the xml file.

Can xmllint deal with more than 1 same xpath ? Should i use another langage ?

There is the script (it was in french first, so i did my best for you to understand some french sentence, i hope everything would be fine) :

#!/bin/bash

#Dir with xml and xhl files
dir="/home/batch_spl_ia/jeremy"

while true; do
    # ask for xpath
    read -p "Enter the xpath (ex: /root/element/@attribut or //Prestation/Libelle/@V or //IdVer[text()]) : " xpath

    # ask how many character for the anonymous value
    read -p "How many character do you want ? : " nb_caracteres

    # ask what type of character
    read -p "Do you want alpha (a), digital (n) or alphanumeric (an)? (a/n/an) : " choice

    # Generate string of characters
    if [ "$choice" == "a" ]; then
        replacement=$(openssl rand -base64 "$nb_caracteres" | tr -dc 'a-zA-Z' | head -c "$nb_caracteres")
    elif [ "$choice" == "n" ]; then
        replacement=$(cat /dev/urandom | tr -dc '0-9' | head -c "$nb_caracteres")
    else
        replacement=$(openssl rand -base64 "$nb_caracteres" | tr -dc 'a-zA-Z0-9' | head -c "$nb_caracteres")
    fi

    # Variable to check if a value has been replaced
    value_replaced=false

    # read files in directory
    for file in "$dir"/*.xml "$dir"/*.xhl; do
        if [ -e "$file" ]; then
            # Extract every value for the xpath in file
            orig_value=$(xmllint --xpath "string($xpath)" "$file")
            echo "$orig_value"
            # Check if value has been found for the xpath
            if [ -n "$orig_value" ]; then
                # Replace every appearance of value with the replacement value
                xmllint --shell "$file" << EOF > /dev/null 2>&1
cd $xpath
set $replacement
save
EOF
                value_replaced=true
                echo "value after anonymization : $replacement"
                echo "file anonymized : $file"
            fi
        fi
    done

    # Check if value replaced
    if [ "$value_replaced" = false ]; then
        echo "Error : no value found for the xpath specified : no replacement"
    fi
   
    echo "-----------------------"

    # Ask user if he wants to anonymize another xpath
    read -p "Do you want to anonymize another xpath ?(y/n) : " continuer

    if [ "$continuer" != "y" ]; then
        break
    fi
done

When i try those 2 xpath the value is changed because there is only 1 xpath in file : //Prestation/Libelle/@V //DonneesIndiv/PayeIndivMensuel/Periode/DateDebut/@V

And when i use this xpath the 2 values of the 2 xpath are not changed and the echo only return 1 value instead of 2: //DonneesIndiv/PayeIndivMensuel/Remuneration/Indemnite/Libelle/@V

Thank you for your understanding and help

Upvotes: -1

Views: 130

Answers (1)

LMC
LMC

Reputation: 12662

Find all xpath expression using whereis xmllint shell command and change them with another value. Having an array of expressions helps to change all values and save once instead of opening/saving the file for each expression

file='tmp.xml'
xpath="//parent/paragraph/@newid"
read -a expr_arr < <(printf "%s\n" "whereis $xpath" "bye" | xmllint --shell tmp.xml | grep -v '^[/] >' | tr '\n' ' ')
replacement="xxxyyyzzz"

declare -a arr

# Iterate over found xpath expressions
for expr in "${expr_arr[@]}"; do

    arr[${#arr[*]}]="cd ${expr}"
    arr[${#arr[*]}]="set $replacement"

    echo "$expr value after anonymization : $replacement"
done

arr[${#arr[*]}]="save"
arr[${#arr[*]}]="bye"

echo "file anonymized : $file"
printf '%s\n' "${arr[@]}" | xmllint --shell "$file"

xmllint --xpath "$xpath" "$file"

Given this sample

<?xml version="1.0"?>
<root>
  <parent>
    <paragraph newid="13">Any text</paragraph> 
  </parent>
  <parent>
    <paragraph newid="43">Any text</paragraph> 
    <paragraph newid="as77">Any text</paragraph> 
  </parent>
</root>

Ouput showing the used expressions

/root/parent[1]/paragraph/@newid value after anonymization : xxxyyyzzz
/root/parent[2]/paragraph[1]/@newid value after anonymization : xxxyyyzzz
/root/parent[2]/paragraph[2]/@newid value after anonymization : xxxyyyzzz
file anonymized : tmp.xml
/ > cd /root/parent[1]/paragraph/@newid
newid > set xxxyyyzzz
newid > cd /root/parent[2]/paragraph[1]/@newid
newid > set xxxyyyzzz
newid > cd /root/parent[2]/paragraph[2]/@newid
newid > set xxxyyyzzz
newid > save
newid > bye
 newid="xxxyyyzzz"
 newid="xxxyyyzzz"
 newid="xxxyyyzzz"

Upvotes: 0

Related Questions