Reputation: 359
Hi I'm learning data science and am trying to make a big data company list from a list with companies in various industries.
I have a list of row numbers for big data companies, named comp_rows. Now, I'm trying to make a new dataframe with the filtered companies based on the row numbers. Here I need to add rows to an existing dataframe but I got an error. Could someone help?
my datarame looks like this.
company_url company tag_line product data
0 https://angel.co/billguard BillGuard The fastest smartest way to track your spendin... BillGuard is a personal finance security app t... New York City · Financial Services · Security ...
1 https://angel.co/tradesparq Tradesparq The world's largest social network for global ... Tradesparq is Alibaba.com meets LinkedIn. Trad... Shanghai · B2B · Marketplaces · Big Data · Soc...
2 https://angel.co/sidewalk Sidewalk Hoovers (D&B) for the social era Sidewalk helps companies close more sales to s... New York City · Lead Generation · Big Data · S...
3 https://angel.co/pangia Pangia The Internet of Things Platform: Big data mana... We collect and manage data from sensors embedd... San Francisco · SaaS · Clean Technology · Big ...
4 https://angel.co/thinknum Thinknum Financial Data Analysis Thinknum is a powerful web platform to value c... New York City · Enterprise Software · Financia...
My code is below:
bigdata_comp = DataFrame(data=None,columns=['company_url','company','tag_line','product','data'])
for count, item in enumerate(data.iterrows()):
for number in comp_rows:
if int(count) == int(number):
bigdata_comp.append(item)
Error:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-234-1e4ea9bd9faa> in <module>()
4 for number in comp_rows:
5 if int(count) == int(number):
----> 6 bigdata_comp.append(item)
7
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.pyc in append(self, other, ignore_index, verify_integrity)
3814 from pandas.tools.merge import concat
3815 if isinstance(other, (list, tuple)):
-> 3816 to_concat = [self] + other
3817 else:
3818 to_concat = [self, other]
TypeError: can only concatenate list (not "tuple") to list
Upvotes: 4
Views: 38223
Reputation: 16144
It seems you are trying to filter out an existing dataframe based on indices (which are stored in your variable called comp_rows
). You can do this without using loops by using loc
, like shown below:
In [1161]: df1.head()
Out[1161]:
A B C D
a 1.935094 -0.160579 -0.173458 0.433267
b 1.669632 -1.130893 -1.210353 0.822138
c 0.494622 1.014013 0.215655 1.045139
d -0.628889 0.223170 -0.616019 -0.264982
e -0.823133 0.385790 -0.654533 0.582255
We will get the rows with indices 'a','b' and 'c', for all columns:
In [1162]: df1.loc[['a','b','c'],:]
Out[1162]:
A B C D
a 1.935094 -0.160579 -0.173458 0.433267
b 1.669632 -1.130893 -1.210353 0.822138
c 0.494622 1.014013 0.215655 1.045139
You can read more about it here.
About your code:
1.
You do not need to iterate through a list to see if an item is present in it:
Use the in
operator. For example -
In [1199]: 1 in [1,2,3,4,5]
Out[1199]: True
so, instead of
for number in comp_rows:
if int(count) == int(number):
do this
if number in comp_rows
2.
pandas append
does not happen in-place. You have to store the result into another variable. See here.
3.
Append one row at a time is a slow way to do what you want. Instead, save each row that you want to add into a list of lists, make a dataframe of it and append it to the target dataframe in one-go. Something like this..
temp = []
for count, item in enumerate(df1.loc[['a','b','c'],:].iterrows()):
# if count in comp_rows:
temp.append( list(item[1]))
## -- End pasted text --
In [1233]: temp
Out[1233]:
[[1.9350940285526077,
-0.16057932637141861,
-0.17345827000000605,
0.43326722021644282],
[1.66963201034217,
-1.1308932586268696,
-1.2103527446031515,
0.82213753819050794],
[0.49462218161377397,
1.0140133740187862,
0.2156547595968879,
1.0451391564351897]]
In [1236]: df2 = df1.append(pd.DataFrame(temp, columns=['A','B','C','D']))
In [1237]: df2
Out[1237]:
A B C D
a 1.935094 -0.160579 -0.173458 0.433267
b 1.669632 -1.130893 -1.210353 0.822138
c 0.494622 1.014013 0.215655 1.045139
d -0.628889 0.223170 -0.616019 -0.264982
e -0.823133 0.385790 -0.654533 0.582255
f -0.872135 2.938475 -0.099367 -1.472519
0 1.935094 -0.160579 -0.173458 0.433267
1 1.669632 -1.130893 -1.210353 0.822138
2 0.494622 1.014013 0.215655 1.045139
Upvotes: 8
Reputation: 10970
Replace the following line:
for count, item in enumerate(data.iterrows()):
by
for count, (index, item) in enumerate(data.iterrows()):
or even simply as
for count, item in data.iterrows():
Upvotes: 0