Name File Type Size Last Modified
ghtorrent_users_data.csv text/csv 164.9 MB 01/06/2022 06:50:AM
repo_slugs.csv text/csv 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_08.csv text/csv 1.3 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0809.csv text/csv 3 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0810.csv text/csv 6.1 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0811.csv text/csv 11.5 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0812.csv text/csv 20.5 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0813.csv text/csv 35.5 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0814.csv text/csv 58.2 MB 01/06/2022 06:50:AM
sna_intl_ctr_edgelist_dd_lchn_0815.csv text/csv 94.5 MB 01/06/2022 06:50:AM

Project Citation: 

Kramer, Brandon. International Collaboration in Open Source Software: A Longitudinal Network Analysis of GitHub. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2022-02-06. https://doi.org/10.3886/E158761V1

Project Description

Summary:  View help for Summary The project analyzes the evolution of international open-source software collaboration networks on GitHub from 2008-2019.
Funding Sources:  View help for Funding Sources National Science Foundation

Scope of Project

Geographic Coverage:  View help for Geographic Coverage International
Time Period(s):  View help for Time Period(s) 1/1/2008 – 12/1/2019 (Start of 2008 - End of 2019)
Collection Date(s):  View help for Collection Date(s) 1/1/2020 – 4/1/2020 (Data collected early 2020)
Universe:  View help for Universe Data derives from repositories posted on GitHub with an Open-Source Initiative (OSI) approved license created between January 2008 and the end of 2019.
Collection Notes:  View help for Collection Notes Data derives from repositories posted on GitHub with an Open-Source Initiative (OSI) approved license created between January 2008 and the end of 2019. The data was collected using the GHOST.jl package available at https://github.com/uva-bi-sdad/GHOST.jl. After scraping all of the commits from these repositories, edgelists were created for each cumulative year (2008, 2008-2009, 2008-2010, etc.) based on whether users made commits to the same project (i.e. all nodes are users and all edges are commits to a common repository). This data was then joined to a subset of GHTorrent's user data (with supplementary email data) to determine the country of the user based on location, organization, and/or email.

Methodology

Data Source:  View help for Data Source Data derives from repositories posted on GitHub with an Open-Source Initiative (OSI) approved license created between January 2008 and the end of 2019. The data was collected using the GHOST.jl package available at https://github.com/uva-bi-sdad/GHOST.jl. After scraping all of the commits from these repositories, edgelists were created for each cumulative year (2008, 2008-2009, 2008-2010, etc.) based on whether users made commits to the same project (i.e. all nodes are users and all edges are commits to a common repository). This data was then joined to a subset of GHTorrent's user data (with supplementary email data) to determine the country of the user based on location, organization, and/or email.
Collection Mode(s):  View help for Collection Mode(s) web scraping
Geographic Unit:  View help for Geographic Unit International

Related Publications

Published Versions

Export Metadata

Report a Problem

Found a serious problem with the data, such as disclosure risk or copyrighted content? Let us know.

This material is distributed exactly as it arrived from the data depositor. ICPSR has not checked or processed this material. Users should consult the investigator(s) if further information is desired.