Reputation: 130
Given the following Data Frame:
df = pd.DataFrame({'term' : ['analys','applic','architectur','assess','item','methodolog','research','rs','studi','suggest','test','tool','viewer','work'],
'newValue' : [0.810419, 0.631963 ,0.687348, 0.810554, 0.725366, 0.742715, 0.799152, 0.599030, 0.652112, 0.683228, 0.711307, 0.625563, 0.604190, 0.724763]})
df = df.set_index('term')
print(df)
newValue
term
analys 0.810419
applic 0.631963
architectur 0.687348
assess 0.810554
item 0.725366
methodolog 0.742715
research 0.799152
rs 0.599030
studi 0.652112
suggest 0.683228
test 0.711307
tool 0.625563
viewer 0.604190
work 0.724763
I am trying to update values in this string behind each "^" with the values from the Data Frame.
(analysi analys^0.8046919107437134 studi^0.6034331321716309 framework methodolog^0.7360332608222961 architectur^0.6806665658950806)^0.0625 (recommend suggest^0.6603200435638428 rs^0.5923488140106201)^0.125 (system tool^0.6207902431488037 applic^0.610009491443634)^0.25 (evalu assess^0.7828741073608398 test^0.6444937586784363)^0.5
Additionally, this should be done with regard to the corresponding word such that I get this:
(analysi analys^0.810419 studi^0.652112 framework methodolog^0.742715 architectur^0.687348)^0.0625 (recommend suggest^0.683228 rs^0.599030)^0.125 (system tool^0.625563 applic^0.631963)^0.25 (evalu assess^0.810554 test^0.711307)^0.5
Thanks in advance for helping!
Upvotes: 1
Views: 67
Reputation: 405715
The best way I could come up with does this in multiple stages.
First, take the old string and extract all the values that you want to replace. that can be done with a regular expression.
old_string = "(analysi analys^0.8046919107437134 studi^0.6034331321716309 framework methodolog^0.7360332608222961 architectur^0.6806665658950806)^0.0625 (recommend suggest^0.6603200435638428 rs^0.5923488140106201)^0.125 (system tool^0.6207902431488037 applic^0.610009491443634)^0.25 (evalu assess^0.7828741073608398 test^0.6444937586784363)^0.5"
pattern = re.compile(r"(\w+\^(0|[1-9]\d*)(\.\d+)?)")
# pattern.findall(old_string) returns a list of tuples,
# so we need to keep just the outer capturing group for each match.
matches = [m[0] for m in pattern.findall(old_string)]
print("Matches:", matches)
In the next part, we make two dictionaries. One is a dictionary of the prefix (word part, before ^
) of the values to replace to the whole value. We use that to create the second dictionary, from the values to replace to the new values (from the dataframe).
prefix_dict = {}
for m in matches:
pre, post = m.split('^')
prefix_dict[pre] = m
print("Prefixes:", prefix_dict)
matches_dict = {}
for i, row in df.iterrows(): # df is the dataframe from the question
if i in prefix_dict:
old_val = prefix_dict[i]
new_val = "%s^%s" % (i, row.newValue)
matches_dict[old_val] = new_val
print("Matches dict:", matches_dict)
With that done, we can loop through the items in the old value > new value dictionary and replace all the old values in the input string.
new_string = old_string
for key, val in matches_dict.items():
new_string = new_string.replace(key, val)
print("New string:", new_string)
Upvotes: 1