[USVC] Drawing Supply Chain 2 – US Listed Domestic Firms

이 슬라이드 쇼에는 JavaScript가 필요합니다.

<Histogram of firm out nodes, clockwise from left top: 2000, 2005, 2010, 2015>


  • Total Sample number decreased through 1998 to 2016:{1998: 9062, 1999: 8906, 2000: 8512, 2001: 8167, 2002: 7692, 2003: 7447, 2004: 7498, 2005: 7015, 2006: 6690, 2007: 6676, 2008: 7448, 2009: 7792, 2010: 7427, 2011: 7223, 2012: 6943, 2013: 6783, 2014: 6640, 2015: 6231, 2016: 5850}
  • The number of edges(linking firms), however, further decreased in the same period
  • The average shortest length of all possible linkages: {2000: 1.638, 2005: 1.531, 2010: 1.284, 2015: 1.322}


Possible explanations

  • The (trained natural language) model may be over-fitted to early 2000’s
  • Supply Chain among U.S. firms might be actually decreasing due to economic uncertainty


10 Firms with the most in-nodes

(year 2000) : [[‘Walmart Inc’, 0.027717626678215677], [‘Lucent Technologies Inc’, 0.026634906886097875], [‘Hewlett Packard Enterprise Co’, 0.023170203551320916], [‘AT&T Corp’, 0.018189692507579038], [‘Ford Inc’, 0.0173235166738848], [‘Cisco Systems Inc’, 0.01602425292334344], [‘Siemens AG’, 0.013858813339107838], [‘Boeing Corp’, 0.013642269380684278], [‘Intel Corp’, 0.012126461671719359], [‘Target Inc’, 0.01169337375487224]]

(year 2015) : [[‘Walmart Inc’, 0.020942408376963352], [‘AT&T Corp’, 0.010732984293193719], [‘Ford Inc’, 0.010209424083769635], [‘Shell Oil Co’, 0.009947643979057593], [‘Target Inc’, 0.00968586387434555], [‘Home Depot Inc. ‘, 0.009162303664921467], [‘Cisco Inc’, 0.008638743455497384], [‘Microsoft Corp’, 0.008638743455497384]]




I found that customer information is stated in two forms.

One is in the sentence type and the other is the table type.

Hence, I started from dividing the 10-k text into two categories; text and table. (by using html tags)

The methods to deal with them , however, are similar: TEXT CLASSIFICATION

*GOAL 1 (sentence form):

Classifying sentences whether they are relevant to the customer information or not.

Example Sentences:

Net sales to the Company’s three major customers, Staples, Inc., Office Max, and United Stationers, Inc., represented approximately 43% in 2004, 46% in 2003 and 46% in 2002.

For fiscal 2003, Fujitsu accounted for approximately 31 percent of our consolidated accounts receivable and approximately 13 percent of our consolidated gross sales.

In 2004, Matyep in Mexico represented 11.0.% of our consolidated revenues and Burlington Resources Inc. represented 10.1%.

Fleetwood was the Company’s largest customer in 2004, representing approximately 31% of total sales.

I hoped that there would be some rules or sentence structures that can cover the whole customer information in 10-k. I tried manually finding those rules, ended up finding 24 kinds of sentences. Although they can help me find every sentences that contains customer information listed on Compustat data(used as a reference point during my whole research), some of the sentences filtered by those 24 rules are have nothing to do with the revenue information.

To get rid of those irrelevant sentences I adopted the machine learning techniques.

*Annotation (GOLD-STANDARD);









[USVC 1] Scraping 10-k disclosure DATA from SEC

  • 아래의 사이트를 참고하면 보다 정밀한 방법으로 10-k 데이터를 얻을 수 있을 것으로 생각.



  • python 언어를 배우기 시작한 초기에 진행하여 그냥 가지고 있는 CIK number set을  10-k with CIK number에 번갈아가며 대입하여 다운 받는 방식 사용.10-k.png





[Ongoing Project] Drawing Supply Chain from SEC 10-k DATA (using python)

제목은 패기있게 영어로 적었으나… 우선 내용은 한글로…

  • 목표 :  미국 내 기업들 사이의 Supply Chain 그리기


  • 자료 : 미국 기업들이 SEC에 보고하는 10-k DATA

기업 매출 정보를 보다 투명하게 관리하고 향후 기업의 매출액 집중에 따른 위험을 알리기 위해 특정 기업이 한 소비자(기업)에게 얻는 매출액이 전체 매출액의 10%를 넘으면 10-k 데이터에 보고하도록 되어있다.

관련 법안은 현재 기억이 잘 나지 않아 공란으로 남겨둔다.  (2000 년대 초반 시행으로 기억)

  • 필요성 : 기업 사이 역학 관계 파악