[USVC] Drawing Supply Chain 1 – Small Sample

— Title : [Python; NetworkX] Supply Chain analysis
— Key word : networkx, Node, Edge, Centrality, Supply Chain, Value Chain


  • About 200 major firms listed on Compustat data
  • Data Set will soon encompass all the firms with CIK code
  • Customer information extracted from 10-k disclosure data


  • Drawn from the basic networkx graph tool (nx.draw())
  • year : ordered in years ; 2000, 2005, 2010, 2015
  • Size of node : in_degree_centrality
  • Color of node : out_degree_centrality


Sample Code :

(Reference : https://briandew.wordpress.com/2016/06/15/trade-network-analysis-why-centrality-matters/)

import networkx as nx
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt

def draw_G(G, year):
    oc = nx.out_degree_centrality(G)
    for key in oc.keys():
        oc[key] = oc[key]*10
    nx.set_node_attributes(G, name= 'cent', values = oc)
    ic = nx.in_degree_centrality(G)
    nx.set_node_attributes(G, name= 'in', values = ic)
    node_size = [float(G.node[v]['in'])*20000 + 1 for v in G]
    node_color = [float(G.node[v]['cent']) for v in G]
    pos = nx.spring_layout(G, k=30, iterations=8)
    nodes = nx.draw_networkx_nodes(G, pos, node_size=node_size, node_color = node_color, alpha=0.5)
#nodes = nx.draw_networkx_nodes(G, pos, node_color=node_color, alpha=0.5)
    edges = nx.draw_networkx_edges(G, pos, edge_color='black', arrows=True, width=0.3)
    nx.draw_networkx_labels(G, pos, font_size=5)
    plt.text(0,-1.2, 'Node color is out_degree_centrality', fontsize=7)
    plt.title('Compustat firms Supply Chain (year : ' + str(year) + ')', fontsize=12)
    cbar = plt.colorbar(mappable=nodes, cax=None, ax=None, fraction=0.015, pad=0.04)
    cbar.set_clim(0, 1)
    plt.savefig(str(year)+ 'Supply Chain.png', dpi=1000)


  • Longest path :


I found that customer information is stated in two forms.

One is in the sentence type and the other is the table type.

Hence, I started from dividing the 10-k text into two categories; text and table. (by using html tags)

The methods to deal with them , however, are similar: TEXT CLASSIFICATION

*GOAL 1 (sentence form):

Classifying sentences whether they are relevant to the customer information or not.

Example Sentences:

Net sales to the Company’s three major customers, Staples, Inc., Office Max, and United Stationers, Inc., represented approximately 43% in 2004, 46% in 2003 and 46% in 2002.

For fiscal 2003, Fujitsu accounted for approximately 31 percent of our consolidated accounts receivable and approximately 13 percent of our consolidated gross sales.

In 2004, Matyep in Mexico represented 11.0.% of our consolidated revenues and Burlington Resources Inc. represented 10.1%.

Fleetwood was the Company’s largest customer in 2004, representing approximately 31% of total sales.

I hoped that there would be some rules or sentence structures that can cover the whole customer information in 10-k. I tried manually finding those rules, ended up finding 24 kinds of sentences. Although they can help me find every sentences that contains customer information listed on Compustat data(used as a reference point during my whole research), some of the sentences filtered by those 24 rules are have nothing to do with the revenue information.

To get rid of those irrelevant sentences I adopted the machine learning techniques.

*Annotation (GOLD-STANDARD);









[USVC 1] Scraping 10-k disclosure DATA from SEC

  • 아래의 사이트를 참고하면 보다 정밀한 방법으로 10-k 데이터를 얻을 수 있을 것으로 생각.



  • python 언어를 배우기 시작한 초기에 진행하여 그냥 가지고 있는 CIK number set을  10-k with CIK number에 번갈아가며 대입하여 다운 받는 방식 사용.10-k.png





[Ongoing Project] Drawing Supply Chain from SEC 10-k DATA (using python)

제목은 패기있게 영어로 적었으나… 우선 내용은 한글로…

  • 목표 :  미국 내 기업들 사이의 Supply Chain 그리기


  • 자료 : 미국 기업들이 SEC에 보고하는 10-k DATA

기업 매출 정보를 보다 투명하게 관리하고 향후 기업의 매출액 집중에 따른 위험을 알리기 위해 특정 기업이 한 소비자(기업)에게 얻는 매출액이 전체 매출액의 10%를 넘으면 10-k 데이터에 보고하도록 되어있다.

관련 법안은 현재 기억이 잘 나지 않아 공란으로 남겨둔다.  (2000 년대 초반 시행으로 기억)

  • 필요성 : 기업 사이 역학 관계 파악