Timothy Williams
Timothy Williams

Reputation: 217

Dataframe command works in IPython but not script

I have a small program file, here is the relevant code:

import numpy as np
import pandas as pd
from docx import Document


####    Setup the file names, also make provisions for having the user select the file   ####
SHRD_filename = "SHRD - SVN 12485.docx"
SHDD_filename = "SHDD - SVN 12485.doc"
#SHRD_name = PCB_utility.get_file('Select SHRD file')
#SHDD_name = PCB_utility.get_file('Select SHDD file')

data = []
keys = {}

document_SHRD = Document(SHRD_filename)
tables_SHRD = document_SHRD.tables[30]
for i, row in enumerate(tables_SHRD.rows):
    text = (cell.text for cell in row.cells)
    if i == 0:
        keys = tuple(text)
        continue

    row_data = dict(zip(keys, text))
    data.append(row_data)

df_SHRD = pd.DataFrame.from_dict(data)
#cols = df_SHRD.columns.tolist()

print(df_SHRD.tail(20))

s = df_SHRD['HLR Trace Tag'].str.split('  ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'HLR Tags'
del df_SHRD['HLR Trace Tag']

df_SHRD.join(s)

When I initially make the dataframe, it looks like this:

300  HLR-0000094  HLR-0000095  HLR-0000340   LRU-0000440
301  HLR-0000094  HLR-0000095  HLR-0000341   LRU-0000441
302  HLR-0000094  HLR-0000095  HLR-0000342   LRU-0000442
303                            HLR-0000675   LRU-0000745
304                            HLR-0000676   LRU-0000746
305                            HLR-0000677   LRU-0000747
306                            HLR-0000678   LRU-0000748
307                            HLR-0000679   LRU-0000749
308                            HLR-0000680   LRU-0000750

I need to split the HLR tags into individual rows. At the end of my program it comes back as this:

300   LRU-0000440
301   LRU-0000441
302   LRU-0000442
303   LRU-0000745
304   LRU-0000746
305   LRU-0000747
306   LRU-0000748
307   LRU-0000749
308   LRU-0000750

But when I retype:

In [25]:df_SHRD.join(s)
Out[25]: 
300   LRU-0000440  HLR-0000094
300   LRU-0000440  HLR-0000095
300   LRU-0000440  HLR-0000340
301   LRU-0000441  HLR-0000094
301   LRU-0000441  HLR-0000095
301   LRU-0000441  HLR-0000341
302   LRU-0000442  HLR-0000094
302   LRU-0000442  HLR-0000095
302   LRU-0000442  HLR-0000342
303   LRU-0000745  HLR-0000675
304   LRU-0000746  HLR-0000676
305   LRU-0000747  HLR-0000677
306   LRU-0000748  HLR-0000678
307   LRU-0000749  HLR-0000679
308   LRU-0000750  HLR-0000680

[457 rows x 2 columns]

Any help would be appreciated on why the command works in the IPython window but not in the script.

Upvotes: 1

Views: 117

Answers (1)

cs95
cs95

Reputation: 402902

DataFrame.join(other, ...)

Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.

Returns: joined : DataFrame

  1. join is not an inplace operation. It returns a result that must be assigned back to another variable if you want to store the result.

    df = df_SHRD.join(s)
    
  2. IPython displays results when printing variables without the print call, while running through a script does not. This is because of IPython's REPL nature. In either case, you must assign the result back. Try printing df_SHRD.join(s) followed by df_SHRD in IPython, and you'll see.

Upvotes: 1

Related Questions