Reputation: 413
I'm new to Python and Pyspark and I'm practicing TF-IDF. I split all words from sentences in the txt file, removed punctuations, removed words that are in the stop-words list, and saved them as a dictionary with code snippet below.
x = text_file.flatmap(lambda line: str_clean(line).split()
x = x.filter(lambda word: word not in stopwords
x = x.reduceByKey(lambda a,b: a+b)
x = x.collectAsMap()
I have 10 different txt files for this same process. And I'd like to add a string like "@d1"
to keys in dictionary so that I can indicate that the key is from document 1.
How can I add "@d1"
to all keys in the dictionary?
Essentially my dictionary is in the form:
{'word1': 1, 'word2': 1, 'word3': 2, ....}
And I would like it to be:
{'word1@d1': 1, 'word2@d1': 1, 'word3@d1': 2, ...}
Upvotes: 25
Views: 22738
Reputation: 1457
I have a list of dict that looks like below
def prefix_key_dict(prefix,test_dict):
res = {prefix + str(key).lower(): val for key, val in test_dict.items()}
return res
temp_prefix = 'column_'
transformed_dict = [prefix_dict(temp_prefix,each) for each in table_col_list]
and the transformed json looks like below
Upvotes: 1
Reputation: 164673
Try a dictionary comprehension:
{k+'@d1': v for k, v in d.items()}
In Python 3.6+, you can use f-strings:
{f'{k}@d1': v for k, v in d.items()}
Upvotes: 41
Reputation: 5666
You can use dict
constructor to rebuild the dict, appending file number to the end of each key:
>>> d = {'a': 1, 'b': 2}
>>> file_number = 1
>>> dict(("{}@{}".format(k,file_number),v) for k,v in d.items())
>>> {'a@1': 1, 'b@1': 2}
Upvotes: 4