lmonty
lmonty

Reputation: 187

How to split column from DataFrame with Pandas

I am reading a CSV file from an API call into a data frame with pandas for some data manipulation.

Currently, I'm getting this response:

n [78]: dfname
Out[78]: 
        productID  amountInStock  index  index_col
7             1.0            NaN      1          7
19            4.0            NaN      2         19
20            1.0            NaN      3         20
22            2.0            NaN      4         22

I then call dfname.reset_index() to create a better index:

dfname.reset_index()
Out[80]: 
      level_0  productID  amountInStock  index  index_col
0           7        1.0            NaN      1          7
1          19        4.0            NaN      2         19
2          20        1.0            NaN      3         20
3          22        2.0            NaN      4         22

But the problem is that the 'productID' series has two columns and I can't work out how to split them!

dfname.productID
Out[82]: 
7          1.0
19         4.0
20         1.0
22         2.0

What I want is dfname.productID to return:

dfname.productID
Out[82]: 
7          
19         
20         
22         

and the other figures currently in productID should be assigned to 'stockqty'.

How do I split this field so that it returns two columns instead of one? I've tried .str.split() to no avail.

The properties of the object are Name: productID, Length: 2102, dtype: float64

Upvotes: 2

Views: 140

Answers (2)

lmonty
lmonty

Reputation: 187

I resolved by specifying the separator when parsing the csv:

        df = pd.read_csv(link, encoding='ISO-8859-1', sep=', ', engine='python')

Upvotes: 0

jpp
jpp

Reputation: 164683

But the problem is that the 'productID' series has two columns and I can't work out how to split them!

Therein lies the misunderstanding. You don't have 2 columns, despite what print tells you. You have one column with an index. This is precisely how a pd.Series object is defined.

What I want is dfname.productID to return:

As above, this isn't possible. Every series has an index. This is non-negotiable.

How do I split this field so that it returns two columns instead of one? I've tried .str.split() to no avail.

This isn't the way forward. In particular, note pd.Series.str.split is for splitting strings within series. You don't have strings here. Instead, use reset_index and rename your column. Or name your index before reset_index. The latter option seems cleaner to me:

df.index.name = 'stockqty'
df = df.reset_index()

print(df)

   stockqty  productID  amountInStock  index  index_col
0         7        1.0            NaN      1          7
1        19        4.0            NaN      2         19
2        20        1.0            NaN      3         20
3        22        2.0            NaN      4         22

Upvotes: 2

Related Questions