CharlesAntoine
CharlesAntoine

Reputation: 83

Create a data frame from dictionaries, which are iteratively generated

I am trying to do web scraping to automate information collection instead of doing it manually.

For a given stock, a function (get_info) will return in a dictionary some information.

Example of output dictionnary

For company A

dict_A = {'enterpriseRevenue': 1.264,
          'profitMargins': -0.00124,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 3907579904,
          'bookValue': 8.326}

For company B

dict_B = {'enterpriseRevenue': 2.789,
          'profitMargins': 2.34,
          'enterpriseToEbitda': 28.328,
          'sharesOutstanding': 2874818942,
          'bookValue': 4.189}

From a list of stocks, I would like to create a data frame with all items of dictionary return by the get_info function. Desired algorithm in "natural language"

Create an empty data frame with 6 columns (first column for stock name, rest for dictionary items), called df

for s in list_of_stocks:
    toto = get_info(s) # get the information for the stock, type(toto)=dict
    add new line to df, which values correspond to toto

Example of desired output

Stock, enterpriseRevenue, profitMargins, enterpriseToEbitda, sharesOutstanding, bookValue
A, 1.264, -0.00124, 28.328, 3907579904, 8.326
B, 2.789, 2.34, 28.328, 2874818942, 4.189

Does anyone have any idea how to build this data frame?

Upvotes: 0

Views: 189

Answers (2)

rada-dev
rada-dev

Reputation: 122

Try to use this.

# with the data structure like this, it might be easier to handle
data = {
    "stockA": {
        'enterpriseRevenue': 1.264,
        'profitMargins': -0.00124,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 3907579904,
        'bookValue': 8.326
    },
    "stockB": {
        'enterpriseRevenue': 2.789,
        'profitMargins': 2.34,
        'enterpriseToEbitda': 28.328,
        'sharesOutstanding': 2874818942,
        'bookValue': 4.189
    }
}

# getting the keys within the stockXY dict, which will be the column names
data_keys = data[list(data.keys())[0]].keys()   # raises IndexError when data dictionary is empty
column_captions = ["stock"]+list(data_keys)
print(", ".join(map(str, column_captions)))

for stock, stock_data in data.items():
    message = stock+", "+", ".join(map(str, stock_data.values()))
    print(message)

It seems like you want to save the data to textfile... if so, you might take a look a json. https://docs.python.org/3/library/json.html

Upvotes: 1

adir abargil
adir abargil

Reputation: 5745

Did you try :

pd.DataFrame([get_info(d) for d in list_of_stocks])

Upvotes: 2

Related Questions