Reputation: 217
I have a small program file, here is the relevant code:
import numpy as np
import pandas as pd
from docx import Document
#### Setup the file names, also make provisions for having the user select the file ####
SHRD_filename = "SHRD - SVN 12485.docx"
SHDD_filename = "SHDD - SVN 12485.doc"
#SHRD_name = PCB_utility.get_file('Select SHRD file')
#SHDD_name = PCB_utility.get_file('Select SHDD file')
data = []
keys = {}
document_SHRD = Document(SHRD_filename)
tables_SHRD = document_SHRD.tables[30]
for i, row in enumerate(tables_SHRD.rows):
text = (cell.text for cell in row.cells)
if i == 0:
keys = tuple(text)
continue
row_data = dict(zip(keys, text))
data.append(row_data)
df_SHRD = pd.DataFrame.from_dict(data)
#cols = df_SHRD.columns.tolist()
print(df_SHRD.tail(20))
s = df_SHRD['HLR Trace Tag'].str.split(' ').apply(pd.Series, 1).stack()
s.index = s.index.droplevel(-1)
s.name = 'HLR Tags'
del df_SHRD['HLR Trace Tag']
df_SHRD.join(s)
When I initially make the dataframe, it looks like this:
300 HLR-0000094 HLR-0000095 HLR-0000340 LRU-0000440
301 HLR-0000094 HLR-0000095 HLR-0000341 LRU-0000441
302 HLR-0000094 HLR-0000095 HLR-0000342 LRU-0000442
303 HLR-0000675 LRU-0000745
304 HLR-0000676 LRU-0000746
305 HLR-0000677 LRU-0000747
306 HLR-0000678 LRU-0000748
307 HLR-0000679 LRU-0000749
308 HLR-0000680 LRU-0000750
I need to split the HLR tags into individual rows. At the end of my program it comes back as this:
300 LRU-0000440
301 LRU-0000441
302 LRU-0000442
303 LRU-0000745
304 LRU-0000746
305 LRU-0000747
306 LRU-0000748
307 LRU-0000749
308 LRU-0000750
But when I retype:
In [25]:df_SHRD.join(s)
Out[25]:
300 LRU-0000440 HLR-0000094
300 LRU-0000440 HLR-0000095
300 LRU-0000440 HLR-0000340
301 LRU-0000441 HLR-0000094
301 LRU-0000441 HLR-0000095
301 LRU-0000441 HLR-0000341
302 LRU-0000442 HLR-0000094
302 LRU-0000442 HLR-0000095
302 LRU-0000442 HLR-0000342
303 LRU-0000745 HLR-0000675
304 LRU-0000746 HLR-0000676
305 LRU-0000747 HLR-0000677
306 LRU-0000748 HLR-0000678
307 LRU-0000749 HLR-0000679
308 LRU-0000750 HLR-0000680
[457 rows x 2 columns]
Any help would be appreciated on why the command works in the IPython window but not in the script.
Upvotes: 1
Views: 117
Reputation: 402902
DataFrame.join
(other, ...
)Join columns with other DataFrame either on index or on a key column. Efficiently Join multiple DataFrame objects by index at once by passing a list.
Returns:
joined
:DataFrame
join
is not an inplace operation. It returns a result that must be assigned back to another variable if you want to store the result.
df = df_SHRD.join(s)
IPython displays results when printing variables without the print
call, while running through a script does not. This is because of IPython's REPL nature. In either case, you must assign the result back. Try printing df_SHRD.join(s)
followed by df_SHRD
in IPython, and you'll see.
Upvotes: 1