We use the Twitter-Friends dataset described in the following paper:
Steering Information Diffusion Dynamically against User Attention Limitation. Shuyang Lin, Qingbo Hu, Fengjiao Wang, Philip S. Yu. ICDM ’14.
We provide Python pickles of this data for convenient reuse in code:
all_links.p(Download): A list of tuples of user ID’s, one for each link in the network, of the form (a, b) where b follows a.
all_tweets.p(Download): A list of tuples, one for each tweet, containing following information in order:
- Tweet creator ID.
- Tweet creation time (in milliseconds since the epoch).
- Retweeted user ID (-1 if not a retweet).
- Replied-to user ID (-1 if not a user-reply).
- The tweet ID.
- Retweeted tweet ID (-1 if not a retweet).
- Replied-to tweet ID (-1 if not a tweet-reply)
Attention potential is a directed function ap(a,b), estimating the attention that follower b gives producer a. Likewise, reactions are also directed reactions(a,b), measuring the number of retweets to and replies of a’s tweets by b. The code linked above evaluates the correlation between attention and reactions in the Twitter dataset.
The code linked above computes statistics of author-occurence clusters. For each cluster size, it reports the following:
- The number of clusters of this size.
- The number of tweets arising from clusters of this size.
- The number of tweets reacted to in clusters of this size.
- The number of clusters of this size containing at least one reaction.