digitalbyte
digitalbyte

Reputation: 95

Python parsing table data into an array

Firstly I am using python 2.7.

What I am trying to achieve is to separate table data from a active directory lookup into "Firstname Lastname" items in an array, which I can later compare to a different array and see which users in the list do not match.

I run dsquery group domainroot -name groupname | dsget group -members | dsget user -fn -ln which outputs a list as such:

  fn               ln               
  Peter            Brill            
  Cliff            Lach          
  Michael          Tsu               
  Ashraf           Shah             
  Greg             Coultas          
  Yi               Li               
  Brad             Black            
  Kevin            Schulte          
  Raymond          Masters (Admin)  
  James            Rapp            
  Allison          Wurst            
  Benjamin         Hammel            
  Edgar            Cuevas           
  Vlad             Dorovic (Admin)        
  Will             Wang                        
dsget succeeded

Notice this list has both spaces before and after each data set.

The code I am using currently:

userarray = []
p = Popen(["cmd.exe"], stdin=PIPE, stdout=PIPE)
p.stdin.write("dsquery group domainroot -name groupname | dsget group -members | dsget user -fn -ln\n")
p.stdin.write("exit\n")
processStdout = p.stdout.read().replace("\r\n", "").strip("")[266:]
cutWhitespace = ' '.join(processStdout.split()).split("dsget")[0]
processSplit = re.findall('[A-Z][^A-Z]*', cutWhitespace)
userarray.append(processSplit)
print userarray

My problem is that when I split on the blank space and attempt to re-group them into "Firstname Lastname" when it hits the line in the list that has (Admin) the grouping gets thrown off because there is a third field. Here is a sample of what I mean:

['Brad ', 'Black ', 'Kevin ', 'Schulte ', 'Raymond ', 'Masters (', 'Admin) ', 'James ', 'Rapp ', 'Allison ', 'Wurst ',

I would appreciate any suggestions on how to group this better or correctly. Thanks!

Upvotes: 0

Views: 163

Answers (2)

Gary Wisniewski
Gary Wisniewski

Reputation: 1120

split has a maxsplit argument you so you can tell it to split only the first separator, so you could say:

cutWhitespace = ' '.join(processStdout.split(None,1)).split("dsget")[0]

on your sixth line to tell it to split no more than once.

Upvotes: 0

njzk2
njzk2

Reputation: 39406

# the whole file.
content = p.stdout.read()
# each line as a single string
lines = content.split()
# lets drop the header and the last line
lines = lines[1:-1]
# Notice how the last name starts at col 19
names = [(line[:19].strip(), line[19:].strip()) for line in lines]
print(names)
=> [('Peter', 'Brill'), ('Cliff', 'Lach'), ('Michael', 'Tsu'), ('Ashraf', 'Shah'), ('Greg', 'Coultas'), ('Yi', 'Li'), ('Brad', 'Black'), ('Kevin', 'Schulte'), ('Raymond', 'Masters (Admin)'), ('James', 'Rapp'), ('Allison', 'Wurst'), ('Benjamin', 'Hammel'), ('Edgar', 'Cuevas'), ('Vlad', 'Dorovic (Admin)'), ('Will', 'Wang')]

Now, if the column size change, just do index = lines[0].indexof('ln') before dropping the header and use that instead of 19

Upvotes: 1

Related Questions