Daniel Crisp
Daniel Crisp

Reputation: 127

Handling expanded git commands with python subprocess module

I'm trying to retrieve and work with data from historical versions of files in a git repo. I'd like to have something like a dictionary that holds <hash>, <time of commit>, <value retrieved from contents of a file revision>, <commit message> for each entry.

I figured the data I retrieve from each file revision, and any calculations done with them, would be best handled using python. And the subprocess module appeared to be the best fit to integrate my git commands.

Below I show how I'm defining a function getval(key, filename) that I had hoped would output <SHA-1 hash>:<Value> to console, but would like to have a dict with more info... also with <time>, and <commit message>.

I help operate an ion accelerator, where we store 'savesets'--or values relevant to a given accelerator tune--using git. Of the values in these files, are things like charge(Q) and mass(A). Ultimately, I want to retrieve both values, get the ratio (Q/A), and display a list of file revision hashes sorted by the charge:mass ratio of the ion we delivered with the settings in that file's revision.

Sample of file (for 56Fe17+):

# Date: 2018-12-21 01:49:16.888 PV,SELECTED,TIMESTAMP,STATUS,SEVERITY,VALUE_TYPE,VALUE,READBACK,READBACK_VALUE,DELTA,READ_ONLY REA_EXP:LINE,0,1544047322.881066957,NO_ALARM,NONE,enum,"JENSA~[UDF;AT-TPC;GPL;JENSA]",,"---",,true REA_BTS19:BEAM:OPTICSFILE,0,1541798820.065952460,NO_ALARM,NONE,string,"BTS19_test3.data",,"---",,true REA_BTS19:BEAM:A_BOOK,0,1545322510.562031883,NO_ALARM,NONE,double,"56.0",,"---",,true REA_BTS19:BEAM:Z_BOOK,0,1545322567.544226340,NO_ALARM,NONE,double,"26.0",,"---",,true REA_BTS19:BEAM:Q_BOOK,0,1545322512.701768974,NO_ALARM,NONE,double,"17.0",,"---",,true

So far--and with the help of others here--I've figured out a git one-liner that greps the revision history of a given file for a key[a string] and uses sed and awk to output <hash>:<val associated with the key>.

Git Oneliner I'm Starting with:

git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp | sed 's/:/,/' | awk -F, '{print $1 ":" $8}'

Oneliner's Output

e78f73fe6f90e93d5b3ccf90975b0e540d12ce09:"56.0" 4b94745bd0a6594bb42a774c95b5fc0847ef2d82:"56.0" f2d5e263deac1d9112be791b39f4ce1b1b34e55d:"56.0" c03800de52143ddb2abfab51fcc665ff5470e363:"56.0" 4a3a564a6d87bc6ff5f3dc7fec7670aeecfe6a79:"58.0" d591941e51c4eab1237ce726a2a49448114b8f26:"58.0" a9c8f5cdf224ff4fd94514c33888796760afd792:"58.0" 2f221492beea1663216dcfb27da89343817b11fd:"58.0"

I've also started playing with the subprocess python module. But I'm struggling to figure out how to handle my more complicated git commands. Generally, I'll want to be able to pass a key, and a file.. something like getval(key, filename).

When my cmd string was ['git', 'grep', str, '$(git rev-list --all)', '--', pathspec], it returned errors stating that '$(git rev-list --all)' was ambiguous. Thinking it wasn't being expanded, I added a separate process to execute the nested command, but I'm not sure I'm doing this correctly.

My Python file (gitfun.py): which I'm currently running the function from

import sys, os
import subprocess

def getval(str, pathspec, repoDir='/mnt/d/stash.projects/rea'):
    p1 = subprocess.Popen(["git", "rev-list", "--all"], stdout=subprocess.PIPE)
    output, err = p1.communicate()

    cmd = ['git', 'grep', str, output, '--', pathspec]        
    p2 = subprocess.Popen(cmd, cwd=repoDir)
    p2.wait()

cwd = '/mnt/d/stash.projects/rea'
filename = 'ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp'
os.chdir(cwd)
getval('BTS19:BEAM:A_BOOK', filename)

Currently it is returning 'file name too long' so (even though I'm not convinced it really is too long) I tried changing my core.longpaths in git config to true, however this had no effect. Again why I suspect I'm not handling my replacement of the $(git rev-list --all) expansion correctly.

For this code, I expect something that looks like this:

522628b8d3db01ac330240b28935933b0448649c:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true 2557c599d2dc67d80ffc5b9be3f79899e0c15a10:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true 7fc97ec2aa76f32265196c42dbcd289c49f0ad93:ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp:REA_BTS19:BEAM:A_BOOK,0,1545240215.74320185 5,NO_ALARM,NONE,double,"58.0",,"---",,true ...

But I ultimately want an output to console that looks identical to the git one-liner above, or better yet, a dict that I can print to console or do other things with.

Upvotes: 0

Views: 1170

Answers (1)

larsks
larsks

Reputation: 312470

Remember that your shell tokenizes the command line using white space.

When you run git rev-list --all, you get output like:

2a4be2748fad885f88163a5b9b1b438fe3cb2ece
c1a30c743eb810fbefe1dc314277931fa33842b3
b2e5c75131e94a3543e5dcf9fb641ccd553906b4
95718f7e128a8b36ca93d6589328cc5b739668b1
87a9ada188a8cd1c13e48c21f093be7027d61eca

When you substitute that into your git grep command...

git grep 'BTS19:BEAM:A_BOOK' $(git rev-list --all) -- \
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp

...each line is a separate argument. That is, if the output of git rev-list --all was exactly what I've shown above, then your one-liner would be tokenized into the following arguments, which I have listed one per line for clarity:

git
grep
BTS19:BEAM:A_BOOK
2a4be2748fad885f88163a5b9b1b438fe3cb2ece
c1a30c743eb810fbefe1dc314277931fa33842b3
b2e5c75131e94a3543e5dcf9fb641ccd553906b4
95718f7e128a8b36ca93d6589328cc5b739668b1
87a9ada188a8cd1c13e48c21f093be7027d61eca
--
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp

But you're not doing this in your Python code! You're pasing the entire output of git rev-list --all as a single argument. That means the command you're trying to execute has a fixed number (6) of arguments:

git
grep
BTS19:BEAM:A_BOOK
2a4be2748fad885f88163a5b9b1b438fe3cb2ece c1a30c743eb810fbefe1dc314277931fa33842b3 b2e5c75131e94a3543e5dcf9fb641ccd553906b4 95718f7e128a8b36ca93d6589328cc5b739668b1 87a9ada188a8cd1c13e48c21f093be7027d61eca
--
ReAccelerator/Snapshots/RFQ-JENSA_Setpoints.snp

All those revisions are getting bundled together in a single argument, which is where the "filename too long" error comes from. You need to split that output into multiple arguments just like the shell does:

p1 = subprocess.Popen(["git", "rev-list", "--all"], stdout=subprocess.PIPE)
output, err = p1.communicate()

cmd = ['git', 'grep', str] + output.splitlines() + ['--', pathspec]        
p2 = subprocess.Popen(cmd, cwd=repoDir)
p2.wait()

Upvotes: 2

Related Questions