rsc05
rsc05

Reputation: 3800

Python: If any entry that is empty copy the entry above it

I have the following string variable

data='Industry  \t& Company \t\t\t    \t\t  & Variable Name         \n        Oil       \t& Mobil   \t\t\t    \t\t  & MOBIL         \n                  \t& Texaco  \t\t\t    \t\t  & TEXACO        \n        Computers \t& IBM     \t\t\t    \t\t  & IBM           \n          \t      \t& Digital Equipment Co. \t\t  & DEC  \t\t  \n          \t      \t& Data General \t\t\t\t\t  & DATGEN        \n        Electricity & Consolidated Edison   \t\t  & CONED     \n          \t        & Public Service of New Hampshire & PSNH  \n          \t        & General Public Utilities \t\t  & GPU     \n        Forestry    & Weyerhauser \t\t\t\t\t  & WEYER         \n          \t        & Boise \t\t\t\t\t\t  & BOISE             \n        Electronics & Motorola \t\t\t\t\t\t  & MOTOR           \n          \t        & Tandy \t\t\t\t\t\t  & TANDY             \n        Airlines    & Pan American \t\t\t\t\t  & PANAM         \n          \t        & Delta \t\t\t\t\t\t  & DELTA             \n        Banks       & Continental Illinois \t\t\t  & CONTIL    \n          \t        & Citicorp\t\t\t\t\t\t  & CITCRP          \n        Food        & Gerber \t\t\t\t\t\t  & GERBER            \n          \t        & General Mills \t\t\t\t  & GENMIL        \n        Chemicals   & Dow \t\t\t\t\t\t      & DOW             \n          \t        & Dupont \t\t\t\t\t\t  & DUPONT            \n          \t        & Conoco \t\t\t\t\t\t  & CONOCO            '

I was able to convert it to a panda table using the following codes (it would be nice if you have an easier way of doing it)

lines = data.split("\n")
array = np.zeros(shape=(len(lines),3))
array=array.astype('str')
for i1 in range(len(lines)):
    set1=lines[i1].split('&')
    for i, v in enumerate(set1):
        set1[i]=v.replace('\t', '').replace(' ', '')
    for i2 in range(3):
        array[i1,i2]=set1[i2]

df=pd.DataFrame(array[1:],columns=array[0])

So now my df looks as follow

enter image description here

Is there a way to replace empty cells like the one in 0 oil and 1, 1, to computer and 2,2 to electricity. Such that the empty cell copy the one above it.

Thank you so much in advance

Upvotes: 1

Views: 41

Answers (2)

Siddharth Reddy
Siddharth Reddy

Reputation: 51

Use str.get_dummies()

Refer to this, https://www.geeksforgeeks.org/python-pandas-series-str-get_dummies/

Upvotes: 0

Vivek Kalyanarangan
Vivek Kalyanarangan

Reputation: 9081

Use -

df['Industry'] = df['Industry'].replace('', np.nan).ffill()

Output

0             Oil
1             Oil
2       Computers
3       Computers
4       Computers
5     Electricity
6     Electricity
7     Electricity
8        Forestry
9        Forestry
10    Electronics
11    Electronics
12       Airlines
13       Airlines
14          Banks
15          Banks
16           Food
17           Food
18      Chemicals
19      Chemicals
20      Chemicals
Name: Industry, dtype: object

Upvotes: 1

Related Questions